AI headphones let you listen to only a single person in a crowd

May. 24, 2024.
Interesting privacy issue

A University of Washington team has developed an AI system that lets a user wearing headphones look at a person speaking for three to five seconds and then listen only to that person (“enroll” them).

Their “Target Speech Hearing” app then cancels all other sounds in the environment and plays just the enrolled speaker’s voice in real time, even if the listener moves around in noisy places and no longer faces the speaker.

How it works

To use the system, a person wearing off-the-shelf headphones fitted with microphones taps a button while directing their head at someone talking. The sound waves from that speaker’s voice then should reach the microphones on both sides of the headset simultaneously.

The headphones send that signal to an on-board embeded computer, where the team’s machine learning software learns the desired speaker’s vocal patterns. The system latches onto that speaker’s voice and continues to play it back to the listener, even as the pair moves around. The system’s ability to focus on the enrolled voice improves as the speaker keeps talking, giving the system more instant training data.

This work builds on the team’s previous “semantic hearing” research, which allowed users to select specific sound classes—such as birds or voices—that they wanted to hear, and automatically cancel other sounds in the environment.

The team plans to use the Target Speech Hearing app with earbuds and hearing aids in the future. The code for the proof-of-concept device is available for others to build on, but not commercially available.

Citation: Bandhav Veluri, Malek Itani, Tuochao Chen,Takuya Yoshioka, Shyamnath Gollakota. CHI ’24. Look Once to Hear: Target Speech Hearing with Noisy Examples. Proceedings of the CHI Conference on Human Factors in Computing Systems, May 2024 No.: 37 pages 1–16 (open source)

