Stability AI has unveiled a groundbreaking technology, Stable Audio, marking a significant stride in audio generation. This innovative solution addresses the challenge of creating custom audio clips from simple text prompts. While Stability AI gained renown for its text-to-image generation technology, Stable Diffusion, it has now extended its expertise to music and Audio. This development comes on the heels of their successful foray into image composition by introducing the SDXL base model for Stable Diffusion.
Until now, generating base audio tracks has been possible through ‘symbolic generation’ techniques, often involving MIDI files. However, Stable Audio transcends this by enabling users to craft entirely new musical compositions, breaking free from the constraints of repetitive notes commonly associated with MIDI and symbolic generation. This achievement is attributed to the model’s direct interaction with raw audio samples, leading to superior output quality. The model’s training, encompassing over 800,000 pieces of licensed music from the AudioSparks library, contributes to its robust performance. This rich dataset ensures high-quality audio and provides comprehensive metadata, a critical factor in text-based models.
Unlike image generation models that can emulate the style of specific artists, Stable Audio doesn’t attempt to mimic the likes of iconic bands like The Beatles. This deliberate choice stems from the understanding that musicians seek to embark on their creative journey without rigid stylistic constraints. Instead, Stable Audio empowers users to explore their unique musical expressions.
The Stable Audio model, a diffusion model with approximately 1.2 billion parameters, rivals the original Stable Diffusion model for image generation. The text prompts, integral to generating Audio, were meticulously developed and trained by Stability AI using the Contrastive Language Audio Pretraining (CLAP) technique. To assist users in crafting effective prompts, Stability AI is releasing a prompt guide concurrent with the Stable Audio launch.
Stable Audio will be accessible through a free version and a Pro plan priced at $12 monthly. The free version allows for up to 20 monthly generations, each producing tracks of up to 20 seconds. In contrast, the Pro version elevates these limits, enabling 500 generations and extending track duration to 90 seconds.
In conclusion, Stability AI’s release of Stable Audio heralds a new era in audio generation technology. The company has provided a seamless platform for transforming text prompts into original audio clips by harnessing advanced AI techniques. This innovation expands the horizons of creative expression and demonstrates the potential for AI-powered music and audio production solutions. With its accessible pricing tiers, Stable Audio is poised to become a valuable tool for aspiring and professional audio creators.
Check out the Reference Article and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Niharika is a Technical consulting intern at Marktechpost. She is a third year undergraduate, currently pursuing her B.Tech from Indian Institute of Technology(IIT), Kharagpur. She is a highly enthusiastic individual with a keen interest in Machine learning, Data science and AI and an avid reader of the latest developments in these fields.