Meta Uses Artificial Intelligence (AI) To Compress Audio Files For Quick Sharing

Even with today’s cutting-edge technology, it needs a fast internet connection and lots of storage space to enjoy rich multimedia experiences like sharing high-quality images, audio messages, and video streams.

To overcome these barriers and provide high-quality, uninterrupted experiences for everyone, the Meta team feels that compression techniques are the way to go. Envision being able to listen to an audio message in a place with poor connectivity without interruptions. Their recent work reveals how AI can reach this goal. 

Meta’s Fundamental AI Research (FAIR) group has made improvements in AI-driven audio hyper compression addressing the issues above. The team developed a three-stage system and trained it from beginning to end to achieve the desired audio compression level. A neural network was then used to decode the information. They were able to compress at a pace around ten times faster than MP3 at 64 kbps while maintaining the same level of audio quality.

This novel method achieves state-of-the-art compression and decompression of audio in real-time. There is still much to be done, but the result might be faster, higher-quality calls in low-bandwidth situations and the ability to provide immersive metaverse experiences without upgrading network infrastructure.

Codecs (like MP3, Opus, and EVS), which encode and decode data streams, power most of the audio compression used today. Traditional codecs use frequency decomposition to encode data effectively and use insights into human hearing (psychoacoustics). However, they allow only a certain number of carefully prepared options for efficiently encoding and decoding files.

To recreate the input signal, the team developed Encodec, a neural network trained from beginning to end. There are three sections to it:

  1. The encoder is responsible for converting the raw data into a representation with a lower frame rate and higher dimensions.
  2. The quantizer reduces the representation size to the target size.
  3. A decoder decompresses the signal and reconstructs a waveform that’s nearly identical to the original.

The method was examined by human annotators who used compressing methods like Google’s newest encoder, Lyra-v2, for comparison. Their findings show that the new method achieves state-of-the-art performance in low-bit-rate speech audio compression (1.5 kbps to 12 kbps). This model can encode and decode audio in real-time using just one CPU core, regardless of the bandwidth or quality setting. 

The researchers believe that it is possible to achieve an even smaller size. Therefore, they plan on conducting more studies of the trade-off between processing time and compressed audio size possible in practical research. Future improvements to dedicated processors, like those found in phones and laptops today, could aid the compression and decompression of files while using less power.

Although this method does not extend to video currently, the researchers believe that their work will help the research community to improve scenarios like online video chats, online movie viewing, and multiplayer virtual reality gaming.

As for their future goals, the team intends to investigate spatial audio compression, which will call for a method that can compress multiple audio channels while maintaining correct spatial information. 

This Article is written as a research summary article by Marktechpost Staff based on the research paper 'High Fidelity Neural Audio Compression'. All Credit For This Research Goes To Researchers on This Project. Check out the paper, github link and reference article.
Please Don't Forget To Join Our ML Subreddit
🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...