Google AI Introduces ‘SoundStream’, A Neural Audio Codec That Provides High-Quality Audio For Various Sound Classes, Improving Machine Learning-Driven Audio Codecs


Audio codecs are tools that compress your sound files to make them smaller and less time-consuming. They are essential for streaming, as they save you from having to use up so much data on the internet while you listen in. Audio codecs should be “transparent” to consumers because decoding will result in an indistinguishable output with no added latency compared with raw recordings or uncompressed formats due to encoding/decoding processes.

The recent development of different audio codecs has been instrumental in providing clear, crisp sound for all audiences. The Opus and EVS formats are two examples that have not only met the requirements but also surpassed them when it comes to quality at medium-to-low bitrates (12–20 kbps). However, as you drop your bitrate into very low territory (3kbps), their performance degrades sharply which is a stark contrast from what was seen before. In the search for better audio compression, these experts have been using machine learning techniques that offer data-driven encoding. This has opened up a whole new realm of possibilities in making audio files and other sound waves more compressed without losing quality.

In early 2021, Google AI team released Lyra, a neural audio codec for low-bitrate speech. Now, they are bringing first neural network codec called ‘SoundStream‘ to work on speech and music, while being able to run in real-time on a smartphone CPU. ‘SoundStream’ provides higher-quality audio and encoding different sound types, including clean speech, noisy reverberant speech, music, and environmental sounds. The new codec is able to deliver the best quality at a range of bitrates, making it more efficient than any other before.

With the proliferation of streaming media, it is becoming increasingly important to find new ways for improving audio compression. SoundStream uses machine learning-driven algorithms that outperform existing standards and requires only a single scalable model as opposed to many models with varying complexity per application.

Google’s AI blog reveals that SoundStream will be released as a part of the next, improved version of Lyra. This integration will leverage existing APIs and tools for developers to work with both better sound quality and flexibility in their projects.