University of Washington's AI-powered headphones make it easier to hear one person in a crowd

Original link (2024-06-01)

Target Speech Hearing is a new deep learning algorithm developed at the University of Washington.The user “records” the speakerAnd cancel all environmental noise surrounding this sound.

Currently, the system requires the person wearing the headphones to tap a button or look at the person for 3 to 5 seconds while staring at the person speaking. And it resultsDeep learning modelThe system recognizes the speaker's vocal patterns and attaches to those patterns, so it can be played even if the listener moves or is no longer looking at the speaker.

A simple approach is to ask for clear examples of utterances to record the target speaker. However, this is not suitable for the field of audio applications. This is because obtaining clear speech examples is difficult in real-world scenarios and creates unique user interface problems. We present the first recording interface in which the wearer looks at the target speaker for a few seconds and captures a short, very loud binaural example of one of the speakers.

What is important in this recording step is that the wearer looks in the direction of the speaker. Therefore, even though the wearer's voice is aligned with the binaural microphones, there is a high possibility that other overlapping speakers will be misaligned. This example is used to train a neural network with the characteristics of the target speaker and extract the corresponding embedding vector. Next, a separate neural network is used to extract the desired sound from the cacophony.

According to the researchers, this is a major advance over current noise-cancelling headphones, which can effectively cancel out all sounds, but cannot selectively select speakers based on their acoustic characteristics.

To make this possible, the research team developed a state-of-the-art acoustic separation networkTFGridNetWe solved several problems, including optimizing it and running it in real time on an embedded CPU, and finding ways to train it using synthetic data to build a system that could generalize to unknown speakers in the real world.

Shyam Gollakota, one of the researchers working in the field of semantic hearing, said that their project differs from current artificial intelligence methods by using on-device artificial intelligence to improve people's auditory perception, without relying on cloud services. He stresses that this is something he aims to correct.

Currently, the system can only record one speaker at a time. Another limitation is that recording is only successful if no other loud sounds are heard from the same direction, but if the user is not satisfied with the first result, the user can re-record against the loudspeaker to improve clarity.

Research teamOpen source code and datasetsIt encourages future research to improve target speech hearing.

About the author

Sergio De Simone

See moreless

Nathaniel Loxley

“Travel maven. Beer expert. Subtly charming alcohol fan. Internet junkie. Avid bacon scholar.”

University of Washington's AI-powered headphones make it easier to hear one person in a crowd – InfoQ

About the author

Sergio De Simone

The ranking of the best survival horror games selected by the IGN US editorial team has been released! Resident Evil RE:2 ranked first

Enjoy a hot cigarette while looking at whales and tropical fish under the sea ⁉︎ “Ploom Dive” is an amazing spatial video experience using Apple Vision Pro

Apple Watch now supports sleep apnea, watchOS 11 released – Impress Watch

Best National Burger Day Deals 2024

Trump attacks Fed for ‘playing politics’ with historic rate cut

A fossilized creature may explain a puzzling drawing on a rock wall.

MrBeast Sued Over ‘Unsafe Environment’ on Upcoming Amazon Reality Show | US TV

About the author

Sergio De Simone

More Stories

The ranking of the best survival horror games selected by the IGN US editorial team has been released! Resident Evil RE:2 ranked first

Enjoy a hot cigarette while looking at whales and tropical fish under the sea ⁉︎ “Ploom Dive” is an amazing spatial video experience using Apple Vision Pro

Apple Watch now supports sleep apnea, watchOS 11 released – Impress Watch

You may have missed

Best National Burger Day Deals 2024

Trump attacks Fed for ‘playing politics’ with historic rate cut

A fossilized creature may explain a puzzling drawing on a rock wall.

MrBeast Sued Over ‘Unsafe Environment’ on Upcoming Amazon Reality Show | US TV