The new prototype AI system called Target Speech Hearing allows users to select a person’s voice to remain audible while canceling out all other sounds, and although it’s currently a proof of concept, its creators are in talks to embed it in popular noise-canceling earbuds and make it available for hearing aids. (Source: Image by RR)

Selective Hearing Made Possible with New AI Noise-Canceling Headphones

Life can be noisy, and while noise-canceling headphones can reduce environmental sounds, they often muffle everything indiscriminately, causing users to miss important sounds they want to hear. A new prototype AI system called Target Speech Hearing aims to solve this issue by allowing users to select a specific person’s voice to remain audible while canceling out other noises. Currently a proof of concept, the technology’s creators are in talks to embed it in popular brands of noise-canceling earbuds and hearing aids. The system works by capturing an audio sample of the desired speaker during an “enrollment” process and using neural networks to continuously prioritize and separate that voice from other noises.

The researchers previously trained a neural network to recognize and filter out sounds like babies crying and alarms ringing, but isolating human voices is a tougher challenge due to the complexity of human speech. To make the AI models work in real time with limited computing power and battery life in headphones, the team used an AI compression technique called knowledge distillation. According to a story on technologyreview.com, this involved training a smaller model to imitate the performance of a larger model trained on millions of voices. The smaller model extracts vocal patterns from the noise captured by the headphones’ microphones and uses them to isolate the targeted speaker’s voice.

The system activates when the wearer holds down a button on the headphones while facing the target speaker, capturing an audio sample and extracting the speaker’s vocal characteristics. These characteristics are fed into a second neural network running on a microcontroller connected to the headphones, which keeps the chosen voice separate and prioritizes it for playback to the listener. The more the system is used to focus on a speaker’s voice, the better it becomes at isolating it, even if the wearer turns away from the speaker.

Although currently the system can only successfully enroll a speaker whose voice is the loudest in the vicinity, the team aims to make it work even when the loudest voice is not the target speaker. Experts in the field, like Sefik Emre Eskimez from Microsoft and Samuele Cornell from Carnegie Mellon University, see great potential in this technology for real-world applications, particularly in scenarios like meetings, and regard it as a significant step forward in AI-driven speech separation.

read more at technologyreview.com