Meet Audioflux: A Deep Learning Library For Audio And Music Analysis-Feature Extraction

AudioFlux is a Python library that provides deep learning tools for audio and music analysis and feature extraction. It supports various time-frequency analysis transformation methods, which are techniques for analyzing audio signals in both the time and frequency domains. Some examples of these transformation methods include the short-time Fourier transform (STFT), the constant-Q transform (CQT), and the wavelet transform.

In addition to the time-frequency analysis transformations, AudioFlux also supports hundreds of corresponding time-domain and frequency-domain feature combinations. These features can be used to represent various characteristics of the audio signal, such as its spectral content, its temporal dynamics, and its rhythmic patterns. These features can be extracted from the audio signal and used as input to deep learning networks for classification, separation, music information retrieval (MIR) tasks, and automatic speech recognition (ASR). 

For example, in music classification, AudioFlux could extract a set of features from a piece of music, such as its spectral centroid, mel-frequency cepstral coefficients (MFCCs), and its zero-crossing rate. These features could then be used as input to a deep learning network trained to classify the music into different genres, such as rock, jazz, or hip-hop. AudioFlux provides a comprehensive set of tools for analyzing and processing audio signals. This is an essential asset for professionals and scholars studying and applying methods to analyze audio and music.

The main functions of audioFlux include transform, feature, and mir modules.

  1. Transform: The “Transform” function in audioFlux offers various time-frequency representations using transform algorithms such as BFT, NSGT, CWT, and PWT. These algorithms support several frequency scale types, including linear, mel, bark, erb, octave, and logarithmic scale spectrograms. However, some transforms, such as CQT, VQT, ST, FST, DWT, WPT, and SWT, do not support multiple frequency scale types and can only be used as independent transforms. AudioFlux provides detailed documentation on each transform’s functions, descriptions, and usage. The synchrosqueezing or reassignment technique is also available to sharpen time-frequency representations using algorithms such as reassign, synsq, and wsst. Users can refer to the documentation for more information on these techniques.
  1.  Feature: The “Feature” module in audioFlux offers several algorithms, including spectral, xxcc, deconv, and chroma. The spectral algorithm provides spectrum features and supports all spectrum types. The xxcc algorithm offers cepstrum coefficients and supports all spectrum types, while the deconv algorithm provides deconvolution for spectrum and supports all spectrum types. Lastly, the chroma algorithm offers chroma features, but it only supports the CQT spectrum and can be used with either a linear or octave scale based on BFT.
  1. MIR: The “MIR” module in audioFlux includes several algorithms, such as pitch detection algorithms like YIN, STFT, etc. The onset algorithm provides spectrum flux and novelty, among other techniques. Lastly, the hpss algorithm offers median filtering and NMF techniques.

The library is compatible with multiple operating systems, including Linux, macOS, Windows, iOS, and Android.When audioFlux’s performance was compared to that of other audio libraries, it was found to be the fastest, with the shortest processing time. The test used sample data of 128 milliseconds each (with a sampling rate of 32000 and data length of 4096), and the results were compared across various libraries. The table below shows the time each library takes to extract features for 1000 samples of data.

 Image Source:

The documentation of the package can be found online:

AudioFlux is open to collaboration and welcomes contributions from interested individuals. Users should first fork the latest git repository and create a feature branch to contribute. All submissions must pass continuous integration tests. Moreover, AudioFlux invites users to suggest improvements, including new algorithms, bug reports, feature requests, general inquiries, etc. Users can open an issue on the project’s page to initiate these discussions.

Check out the Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 16k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Niharika is a Technical consulting intern at Marktechpost. She is a third year undergraduate, currently pursuing her B.Tech from Indian Institute of Technology(IIT), Kharagpur. She is a highly enthusiastic individual with a keen interest in Machine learning, Data science and AI and an avid reader of the latest developments in these fields.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...