In collaboration with Runway, Machine Vision and Learning research group at LMU Munich, Eleuther AI, and LAION, Stability AI created the Stable Diffusion text-to-image model that instantly generates beautiful artwork. Stable Diffusion can produce photorealistic 512×512 pixel images based on a textual description of the situation.
It’s a major improvement in speed and quality, allowing it to be executed on consumer GPUs. Because of this, picture generation may now be run by anyone, not only academics, in various settings.
Following the publication of the code and a restricted release of the model weights to the research community, the weights are now being made available to the general public. The most recent version of Stable Diffusion is now available for download and usage on standard desktop PCs. The model can do more than only convert text to a picture; it can also upsize images and transfer their styles between them. With this launch, Stable AI has also made available a beta version of its web-based user interface and API for the model, dubbed DreamStudio.
Several procedures are feasible under the Stable Diffusion framework. As with DALL-E, it can be instructed to produce an image closely resembling a given textual description. Additionally, it can produce a photorealistic image using just a sketch and some written description.
Further, the Stability AI team collaborated with the HuggingFace and CoreWeave to add the following features:
- The model is being made available under a Creative ML OpenRAIL-M license. Use for both profit and non-profit is permitted under this license. Users must ensure that the model is used in a way that does not break the law, and this license must be included in any model distribution. For any service based on the model, it is also required that this information be made available to end users.
- They built an AI-based Safety Classifier standard component of the suite of applications. This considers concepts and other criteria over generations to filter out results that the model’s user might find undesirable.
These models were trained on image-text pairs from a broad internet scrape, which means the model may duplicate some social biases and produce harmful content. The team believes that open mitigation measures and debate about those biases can help them to improve the model’s performance. Therefore, they encourage everyone to utilize this resource responsibly and participate in the community and related discussions to help improve this technology.
Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Bhubaneswar. She is a Data Science enthusiast and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring the new advancements in technologies and their real-life application.