Researchers at Sony Computer Science Laboratories (CSL) have developed a new Deep Learning method to enhance and restore the quality of heavily compressed songs and audio recordings 

Today, many sophisticated tools and technologies allow us to store vast amounts of music and audio recordings on electronic devices. A group of codec technologies comprising encoder and decoder is used to encrypt, alter, and compress media files. 

The so-called lossless and lossy codecs are two different categories of codecs. Lossless codecs, including PKZIP and PNG codecs, duplicate the same file as the original file after decompression. On the other hand, lossy compression techniques result in a copy of the original file that looks and/or sounds the same as the original but uses less space on electronic devices.

✅ [Featured Article] Selected for 2024 GitHub Accelerator: Enabling the Next Wave of Innovation in Enterprise RAG with Small Specialized Language Models

Lossy audio codecs essentially function by compressing digital audio streams, then decompressing them after eliminating some data. Typically, it is difficult or impossible for humans to differentiate between the original and the decompressed file.

However, lossy codecs can introduce flaws and audibly modify audio signals when they employ high compression rates. Deep learning techniques have recently been employed in an effort to get around the drawbacks of lossy codecs and improve compressed files.

A novel deep learning technique created by Sony Computer Science Laboratories (CSL) researchers improves and restores the quality of excessively compressed music and audio recordings. Their approach builds upon generative adversarial networks (GANs), machine learning models in which two neural networks “compete” to make correct or trustworthy predictions.

The proposed model is made up of two distinct models, the “generator (G)” and the “critic (D).” A spectrogram—a visual representation of an audio signal’s spectrum frequencies—represents an extract of an MP3-compressed musical audio signal provided to the generator.

The generator gradually improves its ability to produce a smaller, repaired replica of the original signal. In the meantime, the critic component of the GAN architecture gains the ability to recognize the distinctions between the original, high-quality files and restored ones. To ensure that the music or audio data included in the restored files are as accurate as that in the original as feasible, the information obtained by the critic is ultimately utilized to improve the quality of the restored files.

In a series of tests, the researchers assessed the performance of their GAN-based architecture. The main objective was to see if it could enhance the MP3 input quality and provide compressed samples that were better and more similar to the original file than those produced by existing baseline models for audio compression. Their findings show that the model’s restorations of MP3 songs that had been severely compressed (16 kbit/s and 32 kbit/s) frequently sounded better to expert human listeners than the original compressed files. On the other hand, the team discovered that their model produced marginally subpar results while utilizing lesser compression rates (64 kbit/s mono).

According to their paper, this architecture could produce and add realistic high-frequency information that enhanced the audio quality of compressed tunes. The created material contained percussion components, guitar sounds, and singing voice sibilants.

The team believes that their work can significantly reduce the size of MP3 audio files without affecting their quality or producing faults obvious to the human ear. 

This Article is written as a research summary article by Marktechpost Staff based on the research paper 'Stochastic Restoration of Heavily Compressed Musical Audio using Generative Adversarial Networks'. All Credit For This Research Goes To Researchers on This Project. Check out the paper and reference article.

Please Don't Forget To Join Our ML Subreddit
[Free AI Webinar] 'How to Build Personalized Marketing Chatbots (Gemini vs LoRA)' [May 31, 10 am-11 am PST]