Researchers at UC Berkeley have developed an AI model that detects ‘silent speech.’ The model is based on digital voicing to predict words and generate synthetic speech. Electromyography (EMG), with electrodes located at the face and throat, is used to detect the silent speech.
Researchers assert that the model can enable many applications for people who cannot produce audible speech and assist speech detection for AI tools and additional devices that respond to voice commands.
The team states that digitally voicing silent speech has broad applications. For instance, it can generate a tool similar to a Bluetooth headset that lets individuals continue their phone conversations without disturbing those around them. This kind of device will be valuable when the environment is too loud to capture audible speech or where maintaining silence is essential.
Lip-reading AI is another example of AI that can capture words from silent speech. It can power monitoring devices and support use cases for deaf individuals.
The researchers have used a method in which the required statements’ audio output targets are transferred from vocalized recordings to silent recordings. Then audio speech predictions are generated using a WaveNet decoder.
On comparing the vocalized EMG data with baseline trained data, it is found that the vocalized EMG approach gives a 64% to 4% decline in word error rates in transcriptions of sentences from books and 95% error reduction from the baseline. The researchers have open-sourced a dataset of about 20 hours of facial EMG data to encourage further studies in the domain.
Among other works, researchers in China founded a sarcasm detection model that delivered SOTA performance on a multimodal Twitter dataset. Members of the Masakhane open-source project for translating African languages have published a case study on low-resource machine translation.