At the time of isolation and social distancing due to the pandemic, there are many changes in everyone’s life. One among them is the conference calls with friends and families. But all these conference calls are a noise machine with many people talking simultaneously and no way to recognize who is speaking.
Facebook AI Research introduces a method to separate up to five voices speaking simultaneously on a single microphone. Previous models use a mask and a decoder to sort each speaker’s voice, hence when the number of speakers is high, the performance of these models rapidly degrades. To solve this issue, FAIR uses a novel recurrent neural network architecture that works directly on the raw audio waveform. Like all other speech recognition systems, this system also requires prior knowledge of the number of speakers. In any case, to deal with difficulties when the quantity of speakers is obscure, we assembled a novel framework that consequently distinguishes the number of speakers and chooses the most significant model.
Goals of speech separation tool are
- Estimate the input signals
- Given an input mixture of speech signals, generate an output channel for each speaker
What does this innovation provide to society?
- For people with a hearing aid, this system can help them hear others clearly in a crowded place like restaurants.
- This system can also be used for the separation of background noise effectively from an input speech signal.
Further wonders of this system await!!!