Spotify Research Open-Sources ‘Basic Pitch’: A Machine Learning Tool For Converting Audio Into MIDI

This Article is written as a summay by Marktechpost Staff based on the paper 'Meet Basic Pitch: Spotify’s Open Source Audio-to-MIDI Converter'. All Credit For This Research Goes To The Researchers of This Project. Check out the paper, github, project and post.

Please Don't Forget To Join Our ML Subreddit

Musical Instrument Digital Interface, or MIDI for short, is a communication standard that allows computers, musical instruments, and other gear to speak the same language. It is an essential tool for musicians and producers that work with digital music machines. MIDI is similar to editable sheet music for computers, specifying which notes are played and when in an easy-to-edit format. MIDI notes are typically created by musicians utilizing a computer interface, such as a MIDI keyboard, or manually inputting the notes into their software. While practically all modern artists utilize MIDI, composing original works with it is difficult. This is due to the difficulty of a computer interpreting live performances on actual instruments. This is a significant issue for musicians who prefer to sing their ideas rather than use piano keyboards or other complex music software. To make the process a little easier for artists, a team of researchers from Spotify’s Audio Intelligence Lab, in collaboration with Soundtrap, developed Basic Pitch. This free, open-source application makes converting audio to MIDI easier. Basic Pitch is a simple tool that uses machine learning to transcribe musical notes in audio. Contrary to other ML models of its kind, Basic Pitch is not only adaptable and accurate but also quick and computationally light. The vision behind the tool was to provide an easy solution for artists and producers to turn their recorded ideas into MIDI format. 

Basic Pitch outperforms other note-detection algorithms because it tracks numerous notes simultaneously and across various instruments. Existing systems only allowed users to play one note at a time or were designed for a specific instrument. Important information about pitch blending is frequently lost when converting musical performances to MIDI. This is something that Basic Pitch supports right out of the box. Furthermore, Basic Pitch is not computationally expensive on most modern systems and can run quickly in real-time. Basic Pitch translates audio to MIDI output with a high degree of delicacy and accuracy using a combination of these properties. After that, the MIDI output can be imported into a digital audio workstation for further tweaking. Basic Pitch is trained on a neural network-based system that predicts MIDI note events based on the auditory input. It is generally challenging to create ML models that are both accurate and efficient, and the most straightforward method to do it is to create extensive training datasets. The team agrees that huge models with specific use cases can produce excellent results. However, they wanted to construct a model that could function with input from various instruments and polyphonic recordings to create a versatile and suitable for a wide range of artists. The critical problem was making a light model with the same accuracy as a heavier one.

Source: https://engineering.atspotify.com/2022/06/meet-basic-pitch/

Speed was a significant consideration because the technology was designed for musicians rather than scholars. The researchers knew that shrinking the model’s size would improve speed. Their goal was to design a lightweight and accurate machine learning model. Finally, using a variety of approaches based on previous research from Spotify’s Audio Intelligence Lab and other researchers, the model for Basic Pitch was developed to be lightweight. Basic Pitch is accurate and versatile, as evidenced by its ability to recognize notes from various instrument types, including voice performances, which is somewhat tricky. The primary goal behind making the tool open source was to encourage other machine learning researchers to build lightweight models instead of the more computationally intensive mainstream methods. The team is also ecstatic about the multiple opportunities for music creators, software developers, and researchers to use Basic Pitch to perform work on real-world data. They are eager to any changes and suggestions that come from future studies.