Google open-sourced a new audio codec(coder-decoder) named Lyra. Lyra uses machine learning to enable high-quality voice calls with low bandwidth. Lyra can compress audio down to as little as three kbps, ensuring a sound quality compared to other codecs requiring much greater bandwidth.
The COVID-19 pandemic has led to increased remote work and telecommuting; thus, data limits are being stretched even in areas with a reliable connection. Therefore, Google is now making Lyra open-source to help make a difference in such kinds of situations.
The Lyra codec is optimized in such a way that it can squeeze recognizable and natural-sounding human speech into the minimum possible space. Lyra is built using an ML model trained on thousands of hours of audio of common people speaking in more than 70 languages and accents. It also supports low-powered devices, including a smartphone that only has 90 milliseconds of latency.
As it is a codec, Lyra’s architecture consists of two primary components – an encoder and a decoder.
The encoder captures the distinctive attributes, or features, of someone’s speech when spoken on the phone. Lyra inputs these features in 40-millisecond chunks compresses them then sends them over the network. This process is encoding. On the other hand, the decoder converts those features back into an audio waveform that can be readily received and understood by the person at the other end of the call. The traditional codecs are usually based on digital signal processing (DSP) techniques, whereas Lyra uses a generative model to reconstruct a high-quality voice signal efficiently.
What Lyra does is compress the audio in such a way that it can still be recognized. It generates noticeably lower-quality audio than what many people are used to (i.e., with a normally encoded recording), that is still distinctly recognizable nevertheless.
The Open Source Release
Thus, Google is helping the parts of the world with less bandwidth network with Lyra. Open-sourcing Lyra is a good step as it creates transparency and trust for third-party developers to use the service. Google has implemented Lyra in its free Duo video calling application. It announced that it’s open-sourcing the code because it believes it might be suitable for more applications.
The Lyra code has been written in C++ for good speed, efficiency, and interoperability. It uses the Bazel build a framework with Abseil to write the code, and for thorough unit testing, it uses the GoogleTest framework. The core API of Lyra provides an interface for encoding and decoding at file and packet levels. The complete signal processing toolchain, including various filters and transforms, is provided. It also provides the weights and vector quantizers necessary to run Lyra. Lyra is released as a beta version so as to involve more developers and get feedback as soon as possible.
Google Open-Source: https://opensource.googleblog.com/2021/04/lyra-enabling-voice-calls-for-next-billion-users.html