NVIDIA Introduces GauGAN2: An AI Model That Converts Text Into Images

Nvidia has introduced a new AI model called GauGAN2, the successor to its original and most famous “GauGAN” model. This time around they’re letting users create lifelike landscape images that don’t exist.

GauGAN’s deep learning model enables anybody to turn their imagination into photorealistic masterpieces, and it’s easier than ever before. Simply input “sunset at a beach,” and AI will create the scenario in real-time. If you add a second adjective, such as “sunset at a rocky beach,” or change “sunset” to “afternoon” or “rainy day,” the model, which is based on generative adversarial networks, instantaneously changes the image.

Users may build a segmentation map, a high-level outline that depicts the placement of items in the scene with the click of a button. They may then switch to sketching, fine-tuning the landscape with rough sketches using names such as sky, tree, rock, and river, which the smart paintbrush will then merge into gorgeous artwork.

The new GauGAN2 text-to-image capability is now available on NVIDIA AI Demos, a site where users may learn about AI through the most unique NVIDIA Research demos. GauGAN2 allows users to create and customize scenarios more rapidly and with greater flexibility thanks to the adaptability of text prompts and doodles.

The latest version of GauGAN is more adaptable and can be customized considerably faster because of the addition of text-to-image capabilities. Even a short drawing isn’t as quick or easy as typing a word. The current version is one of the first AI models that include various modalities, such as text, semantic segmentation, drawing, and style, into a single GAN network.

GauGAN2’s AI model was trained on 10 million high-quality landscape photographs on the NVIDIA Selene supercomputer. This NVIDIA DGX SuperPOD system is among the top ten most powerful supercomputers in the world. The researchers utilized a neural network to understand the relationship between words and the pictures they represent, such as “winter,” “foggy,” and “rainbow.”

The neural network powering GauGAN2 creates a wider variety and higher quality of pictures than state-of-the-art models, especially for text-to-image or segmentation map-to-image applications. The GauGAN2 research demo demonstrates the potential for image-generation solid tools for artists in the future. The NVIDIA Canvas app, for example, is built on GauGAN technology and is free to download for anybody who owns an NVIDIA RTX GPU.

GauGAN2 is a robust tool for creating photorealistic art using a combination of words and drawings since it integrates segmentation mapping, inpainting, and text-to-image production in a single model. The user can modify the text-based beginning point withdrawing, such as a snow-capped mountain range.’ GauGAN2 then creates a new picture that has been changed. Trees may be added, things can be altered in height and scale, clouds can be added to the sky, and much more.

The example is one of the first to use a single GAN framework to mix different modalities – text, semantic segmentation, drawing, and style. This helps convert an artist’s vision into a high-quality AI-generated image faster and easier.

Users may use a short word to rapidly produce the significant aspects and theme of an image, such as a snow-capped mountain range, rather than having to sketch out every detail of an imagined landscape. This may then be adjusted using drawings to add a few trees in the foreground or clouds in the sky or make a mountain taller.

Consider replicating a landscape from the Star Wars universe’s legendary planet of Tatooine, which boasts two suns. The words “desert hills sun” are all that is required to establish a starting point, following which users may swiftly pencil in a second sun. It’s an iterative process in which each word typed into the text field contributes to the AI-generated picture.

Demo: http://gaugan.org/gaugan2/


  • https://blogs.nvidia.com/blog/2021/11/22/gaugan2-ai-art-demo/
  • https://venturebeat.com/2021/11/22/nvidias-latest-ai-tech-translates-text-into-landscape-images/

Prathamesh Ingle is a Consulting Content Writer at MarktechPost. He is a Mechanical Engineer and working as a Data Analyst. He is also an AI practitioner and certified Data Scientist with interest in applications of AI. He is enthusiastic about exploring new technologies and advancements with their real life applications