As this paper is currently under review for publication, it cannot be publicly linked. Selected sections are shared here to illustrate the research process and proposed speculative solutions.

Sound of Surveillance

Speculative Futures for Voice Data in India

Introduction

The rustle of silk robes

The clink of armor

The hushed whispers in

dimly lit corridors

Ancient India, as chronicled in Kautilya’s Arthashastra, understood that sound was never just sound,

It was power, intelligence and governance.

Overview

March → August 2025

This research examines how India’s multilingual, orally rooted society interacts with voice-based technology — and what that means for privacy, governance, and user experience. Drawing from a survey of 500 participants and qualitative interviews, the study identifies four critical tension points that define India’s evolving relationship with voice interfaces: the unacknowledged biometric, the convenience paradox, the lingual gap, and task-specific trust.

Building on these findings, the work critiques the limitations of universal privacy models and proposes culturally situated, community-led frameworks for responsible design. Through speculative prototypes — Panchayat Data Vaults and the Bio-Acoustic Data Garden — it envisions a future where data systems are participatory, multilingual, and rooted in shared cultural values.

Ultimately, the research argues that India’s voice future must balance policy, technology, and community — designing systems not for surveillance or convenience, but for dignity, trust, and collective agency.

· Voice & Behavioral Research

· Emerging Technologies & Futures Design

· Systems Thinking & Design Policy

Individual Inquiry

Pull Quote

“Surveillance through sound is not a modern phenomenon. It is a timeless tension between the collection of aural information and the human desire for private discourse.”

Introduction

The rustle of silk robes

The clink of armor

The hushed whispers in

dimly lit corridors

Ancient India, as chronicled in Kautilya’s Arthashastra, understood that sound was never just sound,

It was power, intelligence and governance.

Overview

March → August 2025

This research examines how India’s multilingual, orally rooted society interacts with voice-based technology — and what that means for privacy, governance, and user experience. Drawing from a survey of 500 participants and qualitative interviews, the study identifies four critical tension points that define India’s evolving relationship with voice interfaces: the unacknowledged biometric, the convenience paradox, the lingual gap, and task-specific trust.

Building on these findings, the work critiques the limitations of universal privacy models and proposes culturally situated, community-led frameworks for responsible design. Through speculative prototypes — Panchayat Data Vaults and the Bio-Acoustic Data Garden — it envisions a future where data systems are participatory, multilingual, and rooted in shared cultural values.

Ultimately, the research argues that India’s voice future must balance policy, technology, and community — designing systems not for surveillance or convenience, but for dignity, trust, and collective agency.

· Voice & Behavioral Research

· Emerging Technologies & Futures Design

· Systems Thinking & Design Policy

Individual Inquiry

Pull Quote

“Surveillance through sound is not a modern phenomenon. It is a timeless tension between the collection of aural information and the human desire for private discourse.”

Introduction

The rustle of silk robes

The clink of armor

The hushed whispers in

dimly lit corridors

Ancient India, as chronicled in Kautilya’s Arthashastra, understood that sound was never just sound,

It was power, intelligence and governance.

Overview

March → August 2025

This research examines how India’s multilingual, orally rooted society interacts with voice-based technology — and what that means for privacy, governance, and user experience. Drawing from a survey of 500 participants and qualitative interviews, the study identifies four critical tension points that define India’s evolving relationship with voice interfaces: the unacknowledged biometric, the convenience paradox, the lingual gap, and task-specific trust.

Building on these findings, the work critiques the limitations of universal privacy models and proposes culturally situated, community-led frameworks for responsible design. Through speculative prototypes — Panchayat Data Vaults and the Bio-Acoustic Data Garden — it envisions a future where data systems are participatory, multilingual, and rooted in shared cultural values.

Ultimately, the research argues that India’s voice future must balance policy, technology, and community — designing systems not for surveillance or convenience, but for dignity, trust, and collective agency.

· Voice & Behavioral Research

· Emerging Technologies & Futures Design

· Systems Thinking & Design Policy

Individual Inquiry

Pull Quote

“Surveillance through sound is not a modern phenomenon. It is a timeless tension between the collection of aural information and the human desire for private discourse.”

Introduction

The rustle of silk robes

The clink of armor

The hushed whispers in

dimly lit corridors

Ancient India, as chronicled in Kautilya’s Arthashastra, understood that sound was never just sound,

It was power, intelligence and governance.

Overview

March → August 2025

This research examines how India’s multilingual, orally rooted society interacts with voice-based technology — and what that means for privacy, governance, and user experience. Drawing from a survey of 500 participants and qualitative interviews, the study identifies four critical tension points that define India’s evolving relationship with voice interfaces: the unacknowledged biometric, the convenience paradox, the lingual gap, and task-specific trust.

Building on these findings, the work critiques the limitations of universal privacy models and proposes culturally situated, community-led frameworks for responsible design. Through speculative prototypes — Panchayat Data Vaults and the Bio-Acoustic Data Garden — it envisions a future where data systems are participatory, multilingual, and rooted in shared cultural values.

Ultimately, the research argues that India’s voice future must balance policy, technology, and community — designing systems not for surveillance or convenience, but for dignity, trust, and collective agency.

· Voice & Behavioral Research

· Emerging Technologies & Futures Design

· Systems Thinking & Design Policy

Individual Inquiry

Pull Quote

“Surveillance through sound is not a modern phenomenon. It is a timeless tension between the collection of aural information and the human desire for private discourse.”

Foundational Research

Not all listening is the same

In today’s discourse, voice recognition and speech recognition are often conflated.

  But the distinction is crucial.

Voice Recognition

Also known as speaker recognition or voice biometrics → asks who is speaking.

It extracts unique vocal signatures: pitch, tone, cadence, and even the physical shape of the vocal tract.

These features generate voiceprints, mathematical representations as permanent as fingerprints, as unchangeable as DNA.

Voice Recognition

Also known as speaker recognition or voice biometrics → asks who is speaking.

It extracts unique vocal signatures: pitch, tone, cadence, and even the physical shape of the vocal tract.

These features generate voiceprints, mathematical representations as permanent as fingerprints, as unchangeable as DNA.

Voice Recognition

Also known as speaker recognition or voice biometrics → asks who is speaking.

It extracts unique vocal signatures: pitch, tone, cadence, and even the physical shape of the vocal tract.

These features generate voiceprints, mathematical representations as permanent as fingerprints, as unchangeable as DNA.

Voice Recognition

Also known as speaker recognition or voice biometrics → asks who is speaking.

It extracts unique vocal signatures: pitch, tone, cadence, and even the physical shape of the vocal tract.

These features generate voiceprints, mathematical representations as permanent as fingerprints, as unchangeable as DNA.

Speech recognition

→ asks what is being said.

It converts spoken works into text or commands, focusing on semantic content for convenience and access.

Speech recognition

→ asks what is being said.

It converts spoken works into text or commands, focusing on semantic content for convenience and access.

Speech recognition

→ asks what is being said.

It converts spoken works into text or commands, focusing on semantic content for convenience and access.

Speech recognition

→ asks what is being said.

It converts spoken works into text or commands, focusing on semantic content for convenience and access.

The difference may seem technical, but it is profound.

Speech recognition

empowers;

Voice recognition

surveils.

Speech helps you play a song. Voice can be used to trace, target, and profile.

DRAG

Pull Quote

Every cheerful ‘Hey Google’ is not just a request → it is a biometric trace left behind.

Pull Quote

Every cheerful ‘Hey Google’ is not just a request → it is a biometric trace left behind.

Pull Quote

Every cheerful ‘Hey Google’ is not just a request → it is a biometric trace left behind.

Pull Quote

“Surveillance through sound is not a modern phenomenon. It is a timeless tension between the collection of aural information and the human desire for private discourse.”

Indian Context

Privacy here is not individual

It is collective

Global privacy frameworks like GDPR and CCPA assume individual ownership of devices and personal control over consent.

But in India, these assumptions fracture.

Privacy here is entangled with shared spaces, linguistic diversity, low literacy, and collective decision-making.

Consent isn’t always a choice — sometimes, it’s survival in the face of complexity.

The Indian Context:

When global models meet local realities

Low literacy

Branding

Low literacy rates mean users often cannot parse dense, legalistic consent forms, making truly "informed consent" logistically challenging.

Forms written in legal English exclude millions.

Low literacy

Branding

Low literacy rates mean users often cannot parse dense, legalistic consent forms, making truly "informed consent" logistically challenging.

Forms written in legal English exclude millions.

Low literacy

Branding

Low literacy rates mean users often cannot parse dense, legalistic consent forms, making truly "informed consent" logistically challenging.

Forms written in legal English exclude millions.

Low literacy

Branding

Low literacy rates mean users often cannot parse dense, legalistic consent forms, making truly "informed consent" logistically challenging.

Forms written in legal English exclude millions.

Shared spaces

Branding

In joint families, dormitories, and crowded homes, devices are rarely “personal.” A single voice assistant serves many.

Devices aren’t personal; one voice assistant may serve an entire household.

Shared spaces

Branding

In joint families, dormitories, and crowded homes, devices are rarely “personal.” A single voice assistant serves many.

Devices aren’t personal; one voice assistant may serve an entire household.

Shared spaces

Branding

In joint families, dormitories, and crowded homes, devices are rarely “personal.” A single voice assistant serves many.

Devices aren’t personal; one voice assistant may serve an entire household.

Shared spaces

Branding

In joint families, dormitories, and crowded homes, devices are rarely “personal.” A single voice assistant serves many.

Devices aren’t personal; one voice assistant may serve an entire household.

Rapid adoption

Branding

Cheap data and smartphones have brought hundreds of millions online in the past decade, often without literacy in digital risk.

Rapid adoption

Branding

Cheap data and smartphones have brought hundreds of millions online in the past decade, often without literacy in digital risk.

Rapid adoption

Branding

Cheap data and smartphones have brought hundreds of millions online in the past decade, often without literacy in digital risk.

Rapid adoption

Branding

Cheap data and smartphones have brought hundreds of millions online in the past decade, often without literacy in digital risk.

Pull Quote

Universal privacy models overlook India’s multilingual, collective, and oral reality.

Pull Quote

Universal privacy models overlook India’s multilingual, collective, and oral reality.

Pull Quote

Universal privacy models overlook India’s multilingual, collective, and oral reality.

Pull Quote

“Surveillance through sound is not a modern phenomenon. It is a timeless tension between the collection of aural information and the human desire for private discourse.”

Survey & Insights

Convenience wins

until the stakes feel personal

To understand how voice technologies interact with India’s social, linguistic, and behavioral realities, we conducted a mixed-method study with 500 respondents across 12 urban and semi-urban centers.

Approach

The goal was to uncover two layers:

1) Everyday Awareness — How do users understand voice assistants, privacy, and control?

2) Behavioral Reality — How do they act when faced with trade-offs between ease, safety, and agency?

The study combined general questions with scenario-based prompts to trace both attitudes and actions.

1) General Questions

Do you know what are voiceprints?

82% said No

Have you ever refused voice permissions?

64% said No

2) Scenario-Based Prompts

Scenario 1

Would you prefer the ease of speaking commands, even if your voice is stored as biometric data? Or would you sacrifice convenience to protect your biometric privacy?

63%

of respondents chose convenience

chose to protect biometric privacy

37%

Scenario 1

Would you prefer the ease of speaking commands, even if your voice is stored as biometric data? Or would you sacrifice convenience to protect your biometric privacy?

63%

of respondents chose convenience

chose to protect biometric privacy

37%

Scenario 1

Would you prefer the ease of speaking commands, even if your voice is stored as biometric data? Or would you sacrifice convenience to protect your biometric privacy?

63%

of respondents chose convenience

chose to protect biometric privacy

37%

Scenario 1

Would you prefer the ease of speaking commands, even if your voice is stored as biometric data? Or would you sacrifice convenience to protect your biometric privacy?

63%

of respondents chose convenience

chose to protect biometric privacy

37%

Scenario 2

Would you trust voice assistants to handle sensitive contexts like banking transactions or accessing personal health records?

80.4%

said No

said Yes, but only with safeguards (otp,mfa)

19.6%

Scenario 2

Would you trust voice assistants to handle sensitive contexts like banking transactions or accessing personal health records?

80.4%

said No

said Yes, but only with safeguards (otp,mfa)

19.6%

Scenario 2

Would you trust voice assistants to handle sensitive contexts like banking transactions or accessing personal health records?

80.4%

said No

said Yes, but only with safeguards (otp,mfa)

19.6%

Scenario 2

Would you trust voice assistants to handle sensitive contexts like banking transactions or accessing personal health records?

80.4%

said No

said Yes, but only with safeguards (otp,mfa)

19.6%

From the responses, four clear insights emerged about how people in India use and trust voice as a medium for interaction.

Insights

What We Learned

Voiceprints are invisible to users

Branding

76% didn’t know their voice can be stored as a biometric identifier—opacity undermines informed consent.

Voiceprints are invisible to users

Branding

76% didn’t know their voice can be stored as a biometric identifier—opacity undermines informed consent.

Voiceprints are invisible to users

Branding

76% didn’t know their voice can be stored as a biometric identifier—opacity undermines informed consent.

Voiceprints are invisible to users

Branding

76% didn’t know their voice can be stored as a biometric identifier—opacity undermines informed consent.

Ease beats caution by default

Branding

Immediate utility often outweighs abstract risk; 64% rarely or never read T&Cs.

Ease beats caution by default

Branding

Immediate utility often outweighs abstract risk; 64% rarely or never read T&Cs.

Ease beats caution by default

Branding

Immediate utility often outweighs abstract risk; 64% rarely or never read T&Cs.

Ease beats caution by default

Branding

Immediate utility often outweighs abstract risk; 64% rarely or never read T&Cs.

Language is the interface

Branding

Accent/multilingual realities cause misrecognition. English bias limits adoption and marginalizes non-English speakers.

Language is the interface

Branding

Accent/multilingual realities cause misrecognition. English bias limits adoption and marginalizes non-English speakers.

Language is the interface

Branding

Accent/multilingual realities cause misrecognition. English bias limits adoption and marginalizes non-English speakers.

Language is the interface

Branding

Accent/multilingual realities cause misrecognition. English bias limits adoption and marginalizes non-English speakers.

Trust is context-driven

Branding

Voice works for low-stakes tasks (music, trivia) but fails in sensitive contexts (finance, health)

Trust is context-driven

Branding

Voice works for low-stakes tasks (music, trivia) but fails in sensitive contexts (finance, health)

Trust is context-driven

Branding

Voice works for low-stakes tasks (music, trivia) but fails in sensitive contexts (finance, health)

Trust is context-driven

Branding

Voice works for low-stakes tasks (music, trivia) but fails in sensitive contexts (finance, health)

Pull Quote

Trust in voice is situational. Safe for playlists, unsafe for paychecks.

Pull Quote

Trust in voice is situational. Safe for playlists, unsafe for paychecks.

Pull Quote

Trust in voice is situational. Safe for playlists, unsafe for paychecks.

Pull Quote

“Surveillance through sound is not a modern phenomenon. It is a timeless tension between the collection of aural information and the human desire for private discourse.”

Voice Data Lifecycle

In voice data, nothing is ever truly gone.

The survey revealed how people behave with voice technology; the voice lifecycle uncovers how the system behaves in return.

To trace where control begins to slip inside this hidden infrastructure, we mapped how a single voice interaction travels through five stages — from collection to deletion.

1/5

Collection

"This was more than a redesign—it was a reintroduction of our brand. The attention to detail blew us away. We’ve never looked better online."

1/5

Collection

"This was more than a redesign—it was a reintroduction of our brand. The attention to detail blew us away. We’ve never looked better online."

1/5

Collection

Voiceprints are captured invisibly. Low digital literacy leads to proxy use and shared devices often record children’s data without consent. In crowded homes, background conversations are swept in without awareness.

Collection

1/5

Collection

Pull Quote

Voiceprints don’t vanish when consent is withdrawn — they echo, embedded in systems long after the speaker has fallen silent.

Pull Quote

Voiceprints don’t vanish when consent is withdrawn — they echo, embedded in systems long after the speaker has fallen silent.

Pull Quote

Voiceprints don’t vanish when consent is withdrawn — they echo, embedded in systems long after the speaker has fallen silent.

Pull Quote

“Surveillance through sound is not a modern phenomenon. It is a timeless tension between the collection of aural information and the human desire for private discourse.”

Speculative Solutions

Speculation solution is not prediction. It is rehearsal.

The survey helped us understand how people behave with voice technology.
The lifecycle revealed how the system behaves in return.

Together, they exposed the gap between human intent and system design.

By merging these insights with foundational research, we began to reimagine what a fairer voice ecosystem for India could look like.

Each speculative concept builds directly on the five stages of the voice lifecycle.

Pull Quote

Voiceprints don’t vanish when consent is withdrawn — they echo, embedded in systems long after the speaker has fallen silent.

Pull Quote

Voiceprints don’t vanish when consent is withdrawn — they echo, embedded in systems long after the speaker has fallen silent.

Pull Quote

Voiceprints don’t vanish when consent is withdrawn — they echo, embedded in systems long after the speaker has fallen silent.

Pull Quote

“Surveillance through sound is not a modern phenomenon. It is a timeless tension between the collection of aural information and the human desire for private discourse.”

Conclusion