Historically, music has served as a powerful indicator of human artistic endeavor. Currently, the confluence of traditional musical constructs and computational methodologies is particularly evident. Deep learning, characterized by advanced algorithms and expansive neural networks, is emerging as a potent tool in the domain of music composition. This approach not only automates the generation of melodies and harmonies but also represents a synthesis of human musical insight and computational rigor.
The research community proposed several methods for automatic music generation. Traditional techniques utilize predefined algorithms, while autonomous models, such as RNNs and their advanced variant LSTMs, learn from past notations to produce new ones. Another innovative approach is Generative Adversarial Networks (GANs), where two neural networks work together to compare and create musical data. WaveNet, introduced by Google DeepMind, offers a unique perspective by processing raw audio waves. Despite these advancements, the challenge lies in crafting music that combines technical correctness with auditory appeal.
In this context, a research team from India published a recent paper to create music that people genuinely enjoy. It emphasizes a novel approach where the primary expectation is to produce something other than professional-grade compositions. Instead, the focus is on recognizing musical patterns to craft decent, melodious, enduring, and aurally pleasant melodies.
Concretely, The research team proposed a method based on a multi-layer LSTM model and focused on the ABC notation, an efficient ASCII musical representation. This method utilizes a dataset amalgamating tunes from two instruments and five composers, processed using integer encoding and one-hot encoding techniques. In the architecture, the LSTM functions as the backbone. It’s supplemented by a dropout layer to curb overfitting and a time-distributed dense layer to process timestep outputs. In addition, the architecture employs the SoftMax classifier to produce probabilities for each musical note, with the Adaptive Moment Estimation (Adam) optimizer refining the learning process. Post-training, the LSTM iteratively uses these probabilities to generate novel musical sequences.
To evaluate the proposed approach’s efficacy, the model was trained over 150 epochs, achieving a significant 95% training accuracy. The progression showed a notable increase in accuracy from an initial 73% at 20 epochs, with a marked improvement from the 40th epoch onwards. In-depth music analyses were conducted on the model’s output. Autocorrelation identified consistent patterns, suggesting the music had structured repetition. The Power Spectral Density (PSD) highlighted dominant variations in specific frequency ranges, with the produced music having a relaxing frequency of 565.38 Hz. Noise reduction techniques were employed, specifically using the Butterworth low-pass filter, effectively minimizing noise interference and ensuring high-quality music output. Based on the metrics and analyses, the model’s performance was commendable, producing quality and structured music with minimal noise.
In conclusion, the authors successfully developed a model capable of autonomously composing melodious music using a multi-layer LSTM network. This model could recall details from previous datasets, allowing it to generate polyphonic music with an impressive accuracy of 95%. The research emphasized the potential of deep learning in music generation and its influence on individuals. Future endeavors might include advanced techniques to predict emotional undertones in music through audio pattern analysis, aiming to refine the interaction between AI and humans by seamlessly incorporating music-generation technologies into daily life.
Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Mahmoud is a PhD researcher in machine learning. He also holds a
bachelor's degree in physical science and a master's degree in
telecommunications and networking systems. His current areas of
research concern computer vision, stock market prediction and deep
learning. He produced several scientific articles about person re-
identification and the study of the robustness and stability of deep