OpenAI recently launched Jukebox, a model that generates music with singing in the raw audio domain. As a generative model for music, Jukebox can handle the long context of raw audio using an autoencoder. Jukebox’s autoencoder processes the audio files using a multiscale VQ-VAE to compress it to discrete codes and modeling those using autoregressive Transformers.
Provided with a genre, artist, and lyrics as input, Jukebox can output a new music sample produced from scratch. This is a type of innovation that expands the boundaries of generative models to a new level. Jukebox’s model is capable of generating audio pieces that are multiple minutes long, and with recognizable singing in natural-sounding voices. Please listen to the Jukebox-generated country song listed at the end of this article.
# Required: Sampling conda create --name jukebox python=3.7.5 conda activate jukebox conda install mpi4py=3.0.3 conda install pytorch=1.4 torchvision=0.5 cudatoolkit=10.0 -c pytorch git clone https://github.com/openai/jukebox.git cd jukebox pip install -r requirements.txt pip install -e . # Required: Training conda install av=7.0.01 -c conda-forge pip install ./tensorboardX # Optional: Apex for faster training with fused_adam conda install pytorch=1.1 torchvision=0.3 cudatoolkit=10.0 -c pytorch pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./apex
Sample Explorer: https://jukebox.openai.com/