Get Ready to Rock with Riffusion: The Artificial Intelligence (AI) Model That Brings Music to Life Through Visualization

Imagine music generated by Artificial intelligence. It sounds quite innovative and has been made possible using machine learning. This is done using training Neural network models like LSTM with musical notes and then predicting or generating music.

Diffusion, a technology that was recently introduced, has come up with another unique method that creates bizarre music using audio pictures rather than actual audio. The open-source AI model called Stable Diffusion, which creates images out of the text, was modified to generate images of spectrograms (The frequency content of a sound clip can be represented visually by an audio spectrogram) which can then be converted to audio clips. This is what Riffusion does.

Image Credits: Devin Coldewey

As the music progresses, it becomes louder across the board, and if you know what to listen for, you can even make out specific notes and instrumentation. By no means is the technique flawless or lossless, but it accurately and methodically represents the sound. And by following the same procedure backward, you may convert it to sound once more.

It is feasible to use diffusion models to condition creators’ works on various visuals in addition to a text prompt. This is tremendously helpful for changing sounds while keeping the original clip’s structure intact. The denoising intensity option determines how much the original clip will depart from the new prompt.

Consider that we enter a prompt and produce 100 clips with various seeds. The resulting clips can’t be concatenated because they have different downbeats, tempos, and keys.

The researchers smoothly interpolate between prompts and seeds in the model’s latent space in order to remedy this. The latent space in diffusion models is a feature vector that contains every conceivable outcome the model is capable of producing. Every numerical value in the latent space decodes to a workable output, and similar items are close to one another.

The important thing is that you can use two separate seeds or two distinct prompts with the same seed to sample the latent space between them.

To tie everything together, the researchers created an interactive web application that allows users to enter commands and infinitely generate interpolated content in real-time while viewing the spectrogram timeline in 3D.

The audio seamlessly switches to the new prompt as the user fills in new prompts. The program will interpolate between several seeds of the same prompt if there is no fresh prompt. With a translucent playhead, spectrograms are shown as 3D height maps along a timeline. 

AI-generated music is already a cutting-edge concept, but Riffusion elevates it with a brilliant, peculiar method that creates bizarre and intriguing music utilizing images of audio rather than actual audio. With diffusion generating more new and unique music has been made possible.

Check out the Tool and Code. All Credit For This Research Goes To Researchers on This Project. Also, don’t forget to join our Reddit page and discord channel, where we share the latest AI research news, cool AI projects, and more.

Rishabh Jain, is a consulting intern at MarktechPost. He is currently pursuing in computer sciences from IIIT, Hyderabad. He is a Machine Learning enthusiast and has keen interest in Statistical Methods in artificial intelligence and Data analytics. He is passionate about developing better algorithms for AI.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...