As this paper is currently under review for publication, it cannot be publicly linked. Selected sections are shared here to illustrate the research process and proposed speculative solutions.
Sound of Surveillance
Speculative Futures for Voice Data in India
Introduction
The rustle of silk robes
The clink of armor
The hushed whispers in
dimly lit corridors
Ancient India, as chronicled in Kautilya’s Arthashastra, understood that sound was never just sound,
It was power, intelligence and governance.
Overview
March → August 2025
This research examines how India’s multilingual, orally rooted society interacts with voice-based technology — and what that means for privacy, governance, and user experience. Drawing from a survey of 500 participants and qualitative interviews, the study identifies four critical tension points that define India’s evolving relationship with voice interfaces: the unacknowledged biometric, the convenience paradox, the lingual gap, and task-specific trust.
Building on these findings, the work critiques the limitations of universal privacy models and proposes culturally situated, community-led frameworks for responsible design. Through speculative prototypes — Panchayat Data Vaults and the Bio-Acoustic Data Garden — it envisions a future where data systems are participatory, multilingual, and rooted in shared cultural values.
Ultimately, the research argues that India’s voice future must balance policy, technology, and community — designing systems not for surveillance or convenience, but for dignity, trust, and collective agency.
· Voice & Behavioral Research
· Emerging Technologies & Futures Design
· Systems Thinking & Design Policy
Individual Inquiry


Pull Quote
“Surveillance through sound is not a modern phenomenon. It is a timeless tension between the collection of aural information and the human desire for private discourse.”
Introduction
The rustle of silk robes
The clink of armor
The hushed whispers in
dimly lit corridors
Ancient India, as chronicled in Kautilya’s Arthashastra, understood that sound was never just sound,
It was power, intelligence and governance.
Overview
March → August 2025
This research examines how India’s multilingual, orally rooted society interacts with voice-based technology — and what that means for privacy, governance, and user experience. Drawing from a survey of 500 participants and qualitative interviews, the study identifies four critical tension points that define India’s evolving relationship with voice interfaces: the unacknowledged biometric, the convenience paradox, the lingual gap, and task-specific trust.
Building on these findings, the work critiques the limitations of universal privacy models and proposes culturally situated, community-led frameworks for responsible design. Through speculative prototypes — Panchayat Data Vaults and the Bio-Acoustic Data Garden — it envisions a future where data systems are participatory, multilingual, and rooted in shared cultural values.
Ultimately, the research argues that India’s voice future must balance policy, technology, and community — designing systems not for surveillance or convenience, but for dignity, trust, and collective agency.
· Voice & Behavioral Research
· Emerging Technologies & Futures Design
· Systems Thinking & Design Policy
Individual Inquiry


Pull Quote
“Surveillance through sound is not a modern phenomenon. It is a timeless tension between the collection of aural information and the human desire for private discourse.”
Introduction
The rustle of silk robes
The clink of armor
The hushed whispers in
dimly lit corridors
Ancient India, as chronicled in Kautilya’s Arthashastra, understood that sound was never just sound,
It was power, intelligence and governance.
Overview
March → August 2025
This research examines how India’s multilingual, orally rooted society interacts with voice-based technology — and what that means for privacy, governance, and user experience. Drawing from a survey of 500 participants and qualitative interviews, the study identifies four critical tension points that define India’s evolving relationship with voice interfaces: the unacknowledged biometric, the convenience paradox, the lingual gap, and task-specific trust.
Building on these findings, the work critiques the limitations of universal privacy models and proposes culturally situated, community-led frameworks for responsible design. Through speculative prototypes — Panchayat Data Vaults and the Bio-Acoustic Data Garden — it envisions a future where data systems are participatory, multilingual, and rooted in shared cultural values.
Ultimately, the research argues that India’s voice future must balance policy, technology, and community — designing systems not for surveillance or convenience, but for dignity, trust, and collective agency.
· Voice & Behavioral Research
· Emerging Technologies & Futures Design
· Systems Thinking & Design Policy
Individual Inquiry


Pull Quote
“Surveillance through sound is not a modern phenomenon. It is a timeless tension between the collection of aural information and the human desire for private discourse.”
Introduction
The rustle of silk robes
The clink of armor
The hushed whispers in
dimly lit corridors
Ancient India, as chronicled in Kautilya’s Arthashastra, understood that sound was never just sound,
It was power, intelligence and governance.
Overview
March → August 2025
This research examines how India’s multilingual, orally rooted society interacts with voice-based technology — and what that means for privacy, governance, and user experience. Drawing from a survey of 500 participants and qualitative interviews, the study identifies four critical tension points that define India’s evolving relationship with voice interfaces: the unacknowledged biometric, the convenience paradox, the lingual gap, and task-specific trust.
Building on these findings, the work critiques the limitations of universal privacy models and proposes culturally situated, community-led frameworks for responsible design. Through speculative prototypes — Panchayat Data Vaults and the Bio-Acoustic Data Garden — it envisions a future where data systems are participatory, multilingual, and rooted in shared cultural values.
Ultimately, the research argues that India’s voice future must balance policy, technology, and community — designing systems not for surveillance or convenience, but for dignity, trust, and collective agency.
· Voice & Behavioral Research
· Emerging Technologies & Futures Design
· Systems Thinking & Design Policy
Individual Inquiry


Pull Quote
“Surveillance through sound is not a modern phenomenon. It is a timeless tension between the collection of aural information and the human desire for private discourse.”
Foundational Research
Not all listening is the same
In today’s discourse, voice recognition and speech recognition are often conflated.
In today’s discourse, voice recognition and speech recognition are often conflated.
In today’s discourse, voice recognition and speech recognition are often conflated.
But the distinction is crucial.
But the distinction is crucial.
But the distinction is crucial.
Voice Recognition
Voice Recognition
Also known as speaker recognition or voice biometrics → asks who is speaking.
It extracts unique vocal signatures: pitch, tone, cadence, and even the physical shape of the vocal tract.
These features generate voiceprints, mathematical representations as permanent as fingerprints, as unchangeable as DNA.
Voice Recognition
Voice Recognition
Also known as speaker recognition or voice biometrics → asks who is speaking.
It extracts unique vocal signatures: pitch, tone, cadence, and even the physical shape of the vocal tract.
These features generate voiceprints, mathematical representations as permanent as fingerprints, as unchangeable as DNA.
Voice Recognition
Voice Recognition
Also known as speaker recognition or voice biometrics → asks who is speaking.
It extracts unique vocal signatures: pitch, tone, cadence, and even the physical shape of the vocal tract.
These features generate voiceprints, mathematical representations as permanent as fingerprints, as unchangeable as DNA.
Voice Recognition
Voice Recognition
Also known as speaker recognition or voice biometrics → asks who is speaking.
It extracts unique vocal signatures: pitch, tone, cadence, and even the physical shape of the vocal tract.
These features generate voiceprints, mathematical representations as permanent as fingerprints, as unchangeable as DNA.
Speech recognition
Speech recognition
→ asks what is being said.
It converts spoken works into text or commands, focusing on semantic content for convenience and access.
Speech recognition
Speech recognition
→ asks what is being said.
It converts spoken works into text or commands, focusing on semantic content for convenience and access.
Speech recognition
Speech recognition
→ asks what is being said.
It converts spoken works into text or commands, focusing on semantic content for convenience and access.
Speech recognition
Speech recognition
→ asks what is being said.
It converts spoken works into text or commands, focusing on semantic content for convenience and access.
The difference may seem technical, but it is profound.
The difference may seem technical, but it is profound.
The difference may seem technical, but it is profound.
Speech recognition
Speech recognition
Speech recognition
empowers;
empowers;
empowers;
Voice recognition
Voice recognition
Voice recognition
surveils.
surveils.
surveils.
Speech helps you play a song. Voice can be used to trace, target, and profile.
Speech helps you play a song. Voice can be used to trace, target, and profile.
Speech helps you play a song. Voice can be used to trace, target, and profile.
Pull Quote
Every cheerful ‘Hey Google’ is not just a request → it is a biometric trace left behind.
Pull Quote
Every cheerful ‘Hey Google’ is not just a request → it is a biometric trace left behind.
Pull Quote
Every cheerful ‘Hey Google’ is not just a request → it is a biometric trace left behind.
Pull Quote
“Surveillance through sound is not a modern phenomenon. It is a timeless tension between the collection of aural information and the human desire for private discourse.”
Indian Context
Privacy here is not individual
It is collective
It is collective
Global privacy frameworks like GDPR and CCPA assume individual ownership of devices and personal control over consent.
Global privacy frameworks like GDPR and CCPA assume individual ownership of devices and personal control over consent.
Global privacy frameworks like GDPR and CCPA assume individual ownership of devices and personal control over consent.
But in India, these assumptions fracture.
But in India, these assumptions fracture.
But in India, these assumptions fracture.
Privacy here is entangled with shared spaces, linguistic diversity, low literacy, and collective decision-making.
Privacy here is entangled with shared spaces, linguistic diversity, low literacy, and collective decision-making.
Privacy here is entangled with shared spaces, linguistic diversity, low literacy, and collective decision-making.
Consent isn’t always a choice — sometimes, it’s survival in the face of complexity.
Consent isn’t always a choice — sometimes, it’s survival in the face of complexity.
Consent isn’t always a choice — sometimes, it’s survival in the face of complexity.
The Indian Context:
When global models meet local realities
When global models meet local realities
When global models meet local realities
Low literacy
Branding
Low literacy rates mean users often cannot parse dense, legalistic consent forms, making truly "informed consent" logistically challenging.
Forms written in legal English exclude millions.
Low literacy
Branding
Low literacy rates mean users often cannot parse dense, legalistic consent forms, making truly "informed consent" logistically challenging.
Forms written in legal English exclude millions.
Low literacy
Branding
Low literacy rates mean users often cannot parse dense, legalistic consent forms, making truly "informed consent" logistically challenging.
Forms written in legal English exclude millions.
Low literacy
Branding
Low literacy rates mean users often cannot parse dense, legalistic consent forms, making truly "informed consent" logistically challenging.
Forms written in legal English exclude millions.
Shared spaces
Branding
In joint families, dormitories, and crowded homes, devices are rarely “personal.” A single voice assistant serves many.
Devices aren’t personal; one voice assistant may serve an entire household.
Shared spaces
Branding
In joint families, dormitories, and crowded homes, devices are rarely “personal.” A single voice assistant serves many.
Devices aren’t personal; one voice assistant may serve an entire household.
Shared spaces
Branding
In joint families, dormitories, and crowded homes, devices are rarely “personal.” A single voice assistant serves many.
Devices aren’t personal; one voice assistant may serve an entire household.
Shared spaces
Branding
In joint families, dormitories, and crowded homes, devices are rarely “personal.” A single voice assistant serves many.
Devices aren’t personal; one voice assistant may serve an entire household.
Rapid adoption
Branding
Cheap data and smartphones have brought hundreds of millions online in the past decade, often without literacy in digital risk.
Rapid adoption
Branding
Cheap data and smartphones have brought hundreds of millions online in the past decade, often without literacy in digital risk.
Rapid adoption
Branding
Cheap data and smartphones have brought hundreds of millions online in the past decade, often without literacy in digital risk.
Rapid adoption
Branding
Cheap data and smartphones have brought hundreds of millions online in the past decade, often without literacy in digital risk.
Pull Quote
Universal privacy models overlook India’s multilingual, collective, and oral reality.
Pull Quote
Universal privacy models overlook India’s multilingual, collective, and oral reality.
Pull Quote
Universal privacy models overlook India’s multilingual, collective, and oral reality.
Pull Quote
“Surveillance through sound is not a modern phenomenon. It is a timeless tension between the collection of aural information and the human desire for private discourse.”
Survey & Insights
Convenience wins
until the stakes feel personal
until the stakes feel personal
To understand how voice technologies interact with India’s social, linguistic, and behavioral realities, we conducted a mixed-method study with 500 respondents across 12 urban and semi-urban centers.
To understand how voice technologies interact with India’s social, linguistic, and behavioral realities, we conducted a mixed-method study with 500 respondents across 12 urban and semi-urban centers.
To understand how voice technologies interact with India’s social, linguistic, and behavioral realities, we conducted a mixed-method study with 500 respondents across 12 urban and semi-urban centers.
Approach
The goal was to uncover two layers:
The goal was to uncover two layers:
The goal was to uncover two layers:
1) Everyday Awareness — How do users understand voice assistants, privacy, and control?
1) Everyday Awareness — How do users understand voice assistants, privacy, and control?
1) Everyday Awareness — How do users understand voice assistants, privacy, and control?
2) Behavioral Reality — How do they act when faced with trade-offs between ease, safety, and agency?
2) Behavioral Reality — How do they act when faced with trade-offs between ease, safety, and agency?
2) Behavioral Reality — How do they act when faced with trade-offs between ease, safety, and agency?
The study combined general questions with scenario-based prompts to trace both attitudes and actions.
The study combined general questions with scenario-based prompts to trace both attitudes and actions.
The study combined general questions with scenario-based prompts to trace both attitudes and actions.
1) General Questions
Do you know what are voiceprints?
Do you know what are voiceprints?
Do you know what are voiceprints?
82% said No
82% said No
82% said No
Have you ever refused voice permissions?
Have you ever refused voice permissions?
Have you ever refused voice permissions?
64% said No
64% said No
64% said No
2) Scenario-Based Prompts
Scenario 1
Would you prefer the ease of speaking commands, even if your voice is stored as biometric data? Or would you sacrifice convenience to protect your biometric privacy?
63%
of respondents chose convenience

chose to protect biometric privacy
37%
Scenario 1
Would you prefer the ease of speaking commands, even if your voice is stored as biometric data? Or would you sacrifice convenience to protect your biometric privacy?
63%
of respondents chose convenience

chose to protect biometric privacy
37%
Scenario 1
Would you prefer the ease of speaking commands, even if your voice is stored as biometric data? Or would you sacrifice convenience to protect your biometric privacy?
63%
of respondents chose convenience

chose to protect biometric privacy
37%
Scenario 1
Would you prefer the ease of speaking commands, even if your voice is stored as biometric data? Or would you sacrifice convenience to protect your biometric privacy?
63%
of respondents chose convenience

chose to protect biometric privacy
37%
Scenario 2
Would you trust voice assistants to handle sensitive contexts like banking transactions or accessing personal health records?
80.4%
said No

said Yes, but only with safeguards (otp,mfa)
19.6%
Scenario 2
Would you trust voice assistants to handle sensitive contexts like banking transactions or accessing personal health records?
80.4%
said No

said Yes, but only with safeguards (otp,mfa)
19.6%
Scenario 2
Would you trust voice assistants to handle sensitive contexts like banking transactions or accessing personal health records?
80.4%
said No

said Yes, but only with safeguards (otp,mfa)
19.6%
Scenario 2
Would you trust voice assistants to handle sensitive contexts like banking transactions or accessing personal health records?
80.4%
said No

said Yes, but only with safeguards (otp,mfa)
19.6%
From the responses, four clear insights emerged about how people in India use and trust voice as a medium for interaction.
From the responses, four clear insights emerged about how people in India use and trust voice as a medium for interaction.
From the responses, four clear insights emerged about how people in India use and trust voice as a medium for interaction.
Insights
What We Learned
Voiceprints are invisible to users
Branding
76% didn’t know their voice can be stored as a biometric identifier—opacity undermines informed consent.
Voiceprints are invisible to users
Branding
76% didn’t know their voice can be stored as a biometric identifier—opacity undermines informed consent.
Voiceprints are invisible to users
Branding
76% didn’t know their voice can be stored as a biometric identifier—opacity undermines informed consent.
Voiceprints are invisible to users
Branding
76% didn’t know their voice can be stored as a biometric identifier—opacity undermines informed consent.
Ease beats caution by default
Branding
Immediate utility often outweighs abstract risk; 64% rarely or never read T&Cs.
Ease beats caution by default
Branding
Immediate utility often outweighs abstract risk; 64% rarely or never read T&Cs.
Ease beats caution by default
Branding
Immediate utility often outweighs abstract risk; 64% rarely or never read T&Cs.
Ease beats caution by default
Branding
Immediate utility often outweighs abstract risk; 64% rarely or never read T&Cs.
Language is the interface
Branding
Accent/multilingual realities cause misrecognition. English bias limits adoption and marginalizes non-English speakers.
Language is the interface
Branding
Accent/multilingual realities cause misrecognition. English bias limits adoption and marginalizes non-English speakers.
Language is the interface
Branding
Accent/multilingual realities cause misrecognition. English bias limits adoption and marginalizes non-English speakers.
Language is the interface
Branding
Accent/multilingual realities cause misrecognition. English bias limits adoption and marginalizes non-English speakers.
Trust is context-driven
Branding
Voice works for low-stakes tasks (music, trivia) but fails in sensitive contexts (finance, health)
Trust is context-driven
Branding
Voice works for low-stakes tasks (music, trivia) but fails in sensitive contexts (finance, health)
Trust is context-driven
Branding
Voice works for low-stakes tasks (music, trivia) but fails in sensitive contexts (finance, health)
Trust is context-driven
Branding
Voice works for low-stakes tasks (music, trivia) but fails in sensitive contexts (finance, health)
Pull Quote
Trust in voice is situational. Safe for playlists, unsafe for paychecks.
Pull Quote
Trust in voice is situational. Safe for playlists, unsafe for paychecks.
Pull Quote
Trust in voice is situational. Safe for playlists, unsafe for paychecks.
Pull Quote
“Surveillance through sound is not a modern phenomenon. It is a timeless tension between the collection of aural information and the human desire for private discourse.”
Voice Data Lifecycle
In voice data, nothing is ever truly gone.
The survey revealed how people behave with voice technology; the voice lifecycle uncovers how the system behaves in return.
The survey revealed how people behave with voice technology; the voice lifecycle uncovers how the system behaves in return.
The survey revealed how people behave with voice technology; the voice lifecycle uncovers how the system behaves in return.
To trace where control begins to slip inside this hidden infrastructure, we mapped how a single voice interaction travels through five stages — from collection to deletion.
To trace where control begins to slip inside this hidden infrastructure, we mapped how a single voice interaction travels through five stages — from collection to deletion.
To trace where control begins to slip inside this hidden infrastructure, we mapped how a single voice interaction travels through five stages — from collection to deletion.

1/5
Collection
"This was more than a redesign—it was a reintroduction of our brand. The attention to detail blew us away. We’ve never looked better online."

1/5
Collection
"This was more than a redesign—it was a reintroduction of our brand. The attention to detail blew us away. We’ve never looked better online."

1/5
Collection
Voiceprints are captured invisibly. Low digital literacy leads to proxy use and shared devices often record children’s data without consent. In crowded homes, background conversations are swept in without awareness.

Collection
Voiceprints are captured invisibly. Low digital literacy leads to proxy use and shared devices often record children’s data without consent. In crowded homes, background conversations are swept in without awareness.

1/5
Collection
Voiceprints are captured invisibly. Low digital literacy leads to proxy use and shared devices often record children’s data without consent. In crowded homes, background conversations are swept in without awareness.
Pull Quote
Voiceprints don’t vanish when consent is withdrawn — they echo, embedded in systems long after the speaker has fallen silent.
Pull Quote
Voiceprints don’t vanish when consent is withdrawn — they echo, embedded in systems long after the speaker has fallen silent.
Pull Quote
Voiceprints don’t vanish when consent is withdrawn — they echo, embedded in systems long after the speaker has fallen silent.
Pull Quote
“Surveillance through sound is not a modern phenomenon. It is a timeless tension between the collection of aural information and the human desire for private discourse.”
Speculative Solutions
Speculation solution is not prediction. It is rehearsal.
The survey helped us understand how people behave with voice technology.
The lifecycle revealed how the system behaves in return.
The survey helped us understand how people behave with voice technology.
The lifecycle revealed how the system behaves in return.
The survey helped us understand how people behave with voice technology.
The lifecycle revealed how the system behaves in return.
Together, they exposed the gap between human intent and system design.
Together, they exposed the gap between human intent and system design.
Together, they exposed the gap between human intent and system design.
By merging these insights with foundational research, we began to reimagine what a fairer voice ecosystem for India could look like.
By merging these insights with foundational research, we began to reimagine what a fairer voice ecosystem for India could look like.
By merging these insights with foundational research, we began to reimagine what a fairer voice ecosystem for India could look like.
Each speculative concept builds directly on the five stages of the voice lifecycle.
Each speculative concept builds directly on the five stages of the voice lifecycle.
Each speculative concept builds directly on the five stages of the voice lifecycle.
Pull Quote
Voiceprints don’t vanish when consent is withdrawn — they echo, embedded in systems long after the speaker has fallen silent.
Pull Quote
Voiceprints don’t vanish when consent is withdrawn — they echo, embedded in systems long after the speaker has fallen silent.
Pull Quote
Voiceprints don’t vanish when consent is withdrawn — they echo, embedded in systems long after the speaker has fallen silent.
Pull Quote
“Surveillance through sound is not a modern phenomenon. It is a timeless tension between the collection of aural information and the human desire for private discourse.”
Conclusion




