June 29, 2024

TechNewsInsight

Technology/Tech News – Get all the latest news on Technology, Gadgets with reviews, prices, features, highlights and specificatio

University of Washington's AI-powered headphones make it easier to hear one person in a crowd – InfoQ

University of Washington's AI-powered headphones make it easier to hear one person in a crowd – InfoQ

Original link (2024-06-01)

Target Speech Hearing is a new deep learning algorithm developed at the University of Washington.The user “records” the speakerAnd cancel all environmental noise surrounding this sound.

Currently, the system requires the person wearing the headphones to tap a button or look at the person for 3 to 5 seconds while staring at the person speaking. And it resultsDeep learning modelThe system recognizes the speaker's vocal patterns and attaches to those patterns, so it can be played even if the listener moves or is no longer looking at the speaker.

A simple approach is to ask for clear examples of utterances to record the target speaker. However, this is not suitable for the field of audio applications. This is because obtaining clear speech examples is difficult in real-world scenarios and creates unique user interface problems. We present the first recording interface in which the wearer looks at the target speaker for a few seconds and captures a short, very loud binaural example of one of the speakers.

What is important in this recording step is that the wearer looks in the direction of the speaker. Therefore, even though the wearer's voice is aligned with the binaural microphones, there is a high possibility that other overlapping speakers will be misaligned. This example is used to train a neural network with the characteristics of the target speaker and extract the corresponding embedding vector. Next, a separate neural network is used to extract the desired sound from the cacophony.

According to the researchers, this is a major advance over current noise-cancelling headphones, which can effectively cancel out all sounds, but cannot selectively select speakers based on their acoustic characteristics.

See also  PlayStation桼÷5ȡPLAY ALIVE : Apex Legends Vol.5s1123˥饤󳫺

To make this possible, the research team developed a state-of-the-art acoustic separation networkTFGridNetWe solved several problems, including optimizing it and running it in real time on an embedded CPU, and finding ways to train it using synthetic data to build a system that could generalize to unknown speakers in the real world.

Shyam Gollakota, one of the researchers working in the field of semantic hearing, said that their project differs from current artificial intelligence methods by using on-device artificial intelligence to improve people's auditory perception, without relying on cloud services. He stresses that this is something he aims to correct.

Currently, the system can only record one speaker at a time. Another limitation is that recording is only successful if no other loud sounds are heard from the same direction, but if the user is not satisfied with the first result, the user can re-record against the loudspeaker to improve clarity.

Research teamOpen source code and datasetsIt encourages future research to improve target speech hearing.

About the author