Audio codecs are tools that compress your sound files to make them smaller and less time-consuming. They are essential for streaming, as they save you from having to use up so much data on the internet while you listen in. Audio codecs should be “transparent” to consumers because decoding will result in an indistinguishable output with no added latency compared with raw recordings or uncompressed formats due to encoding/decoding processes.
The recent development of different audio codecs has been instrumental in providing clear, crisp sound for all audiences. The Opus and EVS formats are two examples that have not only met the requirements but also surpassed them when it comes to quality at medium-to-low bitrates (12–20 kbps). However, as you drop your bitrate into very low territory (3kbps), their performance degrades sharply which is a stark contrast from what was seen before. In the search for better audio compression, these experts have been using machine learning techniques that offer data-driven encoding. This has opened up a whole new realm of possibilities in making audio files and other sound waves more compressed without losing quality.
In early 2021, Google AI team released Lyra, a neural audio codec for low-bitrate speech. Now, they are bringing first neural network codec called ‘SoundStream‘ to work on speech and music, while being able to run in real-time on a smartphone CPU. ‘SoundStream’ provides higher-quality audio and encoding different sound types, including clean speech, noisy reverberant speech, music, and environmental sounds. The new codec is able to deliver the best quality at a range of bitrates, making it more efficient than any other before.
With the proliferation of streaming media, it is becoming increasingly important to find new ways for improving audio compression. SoundStream uses machine learning-driven algorithms that outperform existing standards and requires only a single scalable model as opposed to many models with varying complexity per application.
Google’s AI blog reveals that SoundStream will be released as a part of the next, improved version of Lyra. This integration will leverage existing APIs and tools for developers to work with both better sound quality and flexibility in their projects.
Asif Razzaq is an AI Journalist and Cofounder of Marktechpost, LLC. He is a visionary, entrepreneur and engineer who aspires to use the power of Artificial Intelligence for good.
Asif's latest venture is the development of an Artificial Intelligence Media Platform (Marktechpost) that will revolutionize how people can find relevant news related to Artificial Intelligence, Data Science and Machine Learning.
Asif was featured by Onalytica in it’s ‘Who’s Who in AI? (Influential Voices & Brands)’ as one of the 'Influential Journalists in AI' (https://onalytica.com/wp-content/uploads/2021/09/Whos-Who-In-AI.pdf). His interview was also featured by Onalytica (https://onalytica.com/blog/posts/interview-with-asif-razzaq/).