
VoiceSecure threat model. Credit: Nirupam Roy, University of Maryland.
When you make a voice call through Zoom, FaceTime, or WhatsApp, you’re doing more than just sharing what you say. You are revealing your age, gender, emotional state, social background, and personality. It’s a biometric fingerprint as unique as your face. And artificial intelligence is increasingly listening.
“We’re already seeing phishing based on online activity and email input,” said Nirupam Roy, an associate professor of computer science at the University of Maryland.
“With so much of our voice communications now flowing through digital platforms, there are unprecedented vulnerabilities to our own voice privacy. We expect threats around voice and voice data to become very real, especially as artificial intelligence comes into play.”
We worry about what we type in emails or post on social media, but every time we communicate online, our voices inadvertently spread very personal information. Voice data can be dangerous in the wrong hands, potentially enabling targeted phishing attacks, deepfake generation, biometric theft, and even sophisticated social engineering.
Roy is working to address this growing threat to our personal safety. To protect human voice data from theft and use by malicious third parties, he and his UMD research group designed VoiceSecure, an innovative system that masks artificial intelligence voices while keeping speech crystal clear to the human ear.
When every call becomes data mining
It’s not just the content of your conversations that has value to malicious attackers. Roy said the biggest challenge in addressing privacy issues is the “metalinguistic” information that the human voice conveys: emotions, biology, stress patterns, and identity markers.
“Government and military conversations often require strong protection against audio eavesdropping, but even low-risk conversations can reveal large amounts of information,” Roy said. “FaceTime conversations between mothers and sons can reveal sensitive personal information that could be used to create everything from targeted advertising to voice clones used for fraud.”

Nirupam Roy’s research group at the University of Maryland. Credit: John T. Consoli, University of Maryland.
Fraudsters and deepfake creators use AI-generated audio to make their plans more convincing. Biometric theft allows unauthorized access to voice authentication systems such as bank accounts and patient health records. Advanced social engineering attacks are also much more effective when attackers use detailed profiles built from real human voice patterns and biometric details.
Roy pointed out that while companies and platforms already have steps in place to keep user data safe, these strategies are often inadequate in practice.
Some solutions add unintelligible noise to voice conversations, which can reduce call quality for users. Traditional encryption, the most commonly used technology, also faces significant challenges, including the need to encrypt and decrypt content in real time, consuming large amounts of computing power that not all devices can comfortably sustain.
Incompatibility between users’ devices, such as desktop computers and mobile devices, can create security weaknesses that attackers can exploit.
“As communication systems become more complex, end users lose control over their data,” Roy said. “Even though many platforms have end-to-end encryption, these protections are often optional, difficult to implement, or simply not followed, making it easier for bad actors using tools like AI to exploit these weaknesses.”
Roy’s VoiceSecure system aims to address these limitations and combat malicious attacks by leveraging one key difference between humans and machines: the way they both process voice.
“Human hearing has inherent limitations. People are not equally sensitive to all sound frequencies. For example, two sounds that are close together at a higher frequency often cannot be deciphered as different. Psychoacoustic effects determine how our brains understand sounds, and it is not just about frequency, but also sensitivity and context,” Roy explained.
“In contrast, machines treat every frequency as a separate data point with mathematical precision. They analyze every acoustic feature to identify the speaker and extract information.”
The VoiceSecure system uses AI-powered reinforcement learning to optimize voice signals, preserving the characteristics humans use to understand speech and recognize each other while suppressing features that machines rely on recognition and profiling.
VoiceSecure acts as a microphone module that operates at the firmware or driver level to capture and transform voice data as early as possible in the communications pipeline, before it reaches the device operating system. The delicate balance between human and machine listening can create obstacles between private conversations and unwanted AI eavesdropping, Roy noted.

VoiceSecure prototype. Credit: Nirupam Roy, University of Maryland.
“Voice communication is very personal, so we wanted to maintain a human quality within the system. A mother needs to be able to recognize her son’s voice during a call, but automated AI monitoring systems cannot identify the speaker or extract sensitive biometric data,” Roy said.
“The key to this work is to play with the gap between what humans hear and what machines hear.”
Roy and his team have already successfully tested modified voices from VoiceSecure with real users, confirming that while the conversations are incomprehensible to machines, they are understandable to humans.
Users can also customize their preferred level of privacy and maintain control of their voice without relying on the behavior or technology of other parties, such as conversation partners or communication platforms.
The team hopes to work with engineers and industry partners to package the system as installable software that can be applied to any computer or smart device.
Meanwhile, Roy points out that human vigilance is as essential as technological defenses to protect digital systems and privacy.
“Awareness is key to ensuring security when humans are involved,” he said.
In collaboration with UMD Information Science Professor Megha Subramaniam and University of Maryland, Baltimore County Assistant Professor of Computer Science Sanorita Dey, Roy launched Cyber-Ninja, an AI-driven platform that turns cybersecurity training into an interactive, game-like experience.
Cyber-Ninja is designed for teens and seniors to help users detect and avoid phishing attacks while building critical thinking skills and digital confidence. The team has already successfully conducted workshops at libraries across Maryland, demonstrating how AI-powered education can strengthen communities’ resilience to evolving digital threats.
“From customer service chatbots to robot vacuums to embodied devices like Alexa, artificial intelligence is truly ingrained in our lives, and as AI becomes more physically present, the need for robust privacy protections becomes even more urgent,” Roy said.
“We want AI to evolve because it can be very effective, but it’s important that we also evolve our own defense mechanisms to deal with evolving threats.”
Provided by University of Maryland
Citation: Fighting AI with AI: Systems protect personal voice data from automated surveillance (October 28, 2025) Retrieved October 28, 2025 from https://techxplore.com/news/2025-10-ai-personal-voice-automated-surveillance.html
This document is subject to copyright. No part may be reproduced without written permission, except in fair dealing for personal study or research purposes. Content is provided for informational purposes only.
