LyricJam, an innovative computational system that can produce lyrics for live instrumental music, has been recently developed by researchers at the University of Waterloo. This system could assist musicians in writing fresh lyrics that are compatible with the music they make.
Initially, the team (Vechtomova and her colleagues) created a system that can learn specific features or aspects of an artist’s lyrical style by analyzing audio recordings of their songs and words. This system then uses the information acquired in the studies to develop lyrics that are aligned with the style of a specific artist. The researchers then began looking into the idea of creating words for audio samples of recorded instrumental music. They attempted to take this a step further in their latest study by designing a system that can produce suitable lyrics for live music. This study aimed to create a system that can generate lyrics that match the mood and emotions reflected by the music through many characteristics such as chords, instruments, tempo, and so on.
The LyricJam software is user-friendly: a musician performs live music, and the system displays lyric lines that it generates in real-time responding to the music it hears. The created lines are retained for the duration of the session so that the artist can review them once the session is finished.
The researchers’ approach works by transforming raw audio files into spectrograms, then using deep learning models to construct lyrics corresponding to the music they processed in real-time. The model’s architecture comprises two variational autoencoders, one for learning music audio representations and the other for learning lyrics. The researchers then devised two new mechanisms to coordinate the two autoencoders’ representations of music and lyrics. Finally, these mechanisms enable their system to learn which types of lyrics go well with various types of instrumental music.
Overview of the LyricJam model
In Stage 1, in order to learn audio representations, the researchers trained a spectrogram variational autoencoder (VAE).
In Stage 2, they trained a conditional VAE (CVAE) to learn the representations of lyrics conditioned on their corresponding audio clips.
Lastly, in Stage 3, an alignment model based on generative adversarial network (GAN) was trained to align lyrics and audio representations. A music audio clip recorded in real-time is converted into a spectrogram at inference time, which the model uses to generate new lyrics matching the music.
The team conducted a user study to evaluate the system they devised, in which they requested musicians to perform live music and provide comments on the lyrics generated by their system. Surprisingly, the majority of the artists who took part in the study described LyricJam as a non-critical jam partner who encouraged them to improvise and try out new musical expressions.
LyricJam could become a significant tool for musicians and artists all over the world in the future, allowing them to create unique and intriguing lyrics for their songs. Vechtomova and her colleagues are currently working on a final version of the system that artists can use all around the world, as well as developing other tools to help with lyric creation.