Google AI Propose An Machine Learning (ML) Based Audio Separation Approach That Can Identify Birdsongs For Better Species Classification

Birds are identifiable not only by their appearance but also by their songs. We can appreciate many things around us if we listen carefully to our surroundings. Ecologists use birds to study food networks and forest health. If a forest has many woodpeckers, it is reasonable to assume much deadwood. Birds use songs and calls to communicate and mark their territory. As a result, identifying them by ear is the most efficient method.ARUs were designed to address this problem. 

Thousands of hours of audio have been captured thanks to autonomous recording units (ARUs). This audio might help better understand ecosystems and aid in recognizing essential habitats. Manually evaluating the audio data, on the other hand, takes a long time, and knowledge of birdsongs is rare. However, using machine learning (ML) to understand a habitat can drastically minimize the amount of expert assessment required.

However, the auditory classification of bird species using machine learning is not as simple as it appears. Since birds sometimes sing over one another, distinguishing individual voices can be difficult. Furthermore, because training data is typically captured in a loud environment, there are no clear recordings of individual birds to learn from. As a result, current birdsong classification models have difficulty distinguishing between silent, distant, and overlapping vocalizations. Furthermore, some of the most common species are often unlabelled in the backdrop of training recordings. As a result, models frequently overlook common species.

Google AI presented a novel unsupervised method termed mixture invariant training (MixIT) to address the fundamental difficulty of training ML models to automatically separate audio recordings without access to examples of separated sounds. 

Separation of Audio:

MixIT learns to split single-channel recordings into numerous distinct tracks to be taught entirely using noisy, real-world recordings. Mixing two real-world recordings yields a mixture of mixtures. This MoM is used to refine the separation model. The separation model learns to split the MoM into several channels to minimize a loss function. The loss function uses these references to group the split channels, allowing them to be mixed back together to replicate the two original real-world recordings. Since the separation model does not know how the different sounds in the MoM were grouped, it has no alternative except to separate the individual sounds.

Classification of the species:

The audio is first divided into five-second parts, after which a mel-spectrogram of each segment is constructed. To identify bird species, an EfficientNet classifier is used.

To improve classifier training, specific novel strategies were discovered. The classifier is asked to generate labels for each level of the species taxonomy in taxonomic training (genus, family, and order). This allows the model to learn species groups before understanding the minor variations. During training, random low-pass filtering was also helpful in imitating distant noises. When the audio source is moved further away, the high-frequency elements fade out first, followed by the low-frequency bits. This method worked well for identifying certain species.

Shortcomings do exist!

There are times when a single song is split into many channels, which is known as over-separation. Misclassifications arise as a result of this. When multiple birds vocalize simultaneously, the most famous song often receives a lower score after separation. It’s thought to be due to the loss of environmental context or other separation artifacts that don’t show up during classifier training—running the classifier on the split channels with the original audio yields the best results. It’s vital to get the highest possible score for each species.

What is in store for the future?

The classifiers aren’t trained on separated audio, which could be improved in the future. Further investigation, the researchers hope, will help them reduce over-separation and develop better ways to combine separation with categorization. It’s also crucial to comprehend how habitat and species mix alter due to controlled and wildfires.

Paper: https://arxiv.org/pdf/2110.03209.pdf

Reference: https://ai.googleblog.com/2022/01/separating-birdsong-in-wild-for.html