NVIDIA and Tel Aviv Researchers Propose ‘StyleGAN-NADA’, A Text-Driven Method That Converts a Pre-Trained AI Generator to New Domains Using Only a Textual Prompt and No Training Data

2678
Source: https://stylegan-nada.github.io/

GANs have revolutionized the image generation process, allowing for better results in classification and regression tasks. They can capture distribution of images through their semantic-rich latent space making it more efficient than traditional methods such as autoencoders or generative adversarial networks.

Although GANs have shown incredible results, their ability to generate high-quality images is limited by the amount of training data available. For example, it would be challenging to train a model for an artist that has not been recognized yet or when there isn’t sufficient information about what this imaginary scene looks like.

In recent research-based studies, it has been shown that Vision-Language models can be paired with generative models to provide a simple text-driven interface for image generation. But, such works are built upon pre-trained generative models which limit the user’s freedom of in domain manipulation within their respective fields.

Researchers from Tel Aviv University and NVIDIA introduce a text-driven method that enables out-of-domain generation. They called it ‘StyleGAN-NADA‘, a CLIP-guided zero shot method for Non-Adversarial Domain Adaptation of image generators. Simply put, the researchers can modify pre-trained models towards images aligned with textual prompts such as pictures or phrases. Domain shift is achieved by modifying the generator’s weights to images aligned with driving texts.

https://arxiv.org/pdf/2108.00946.pdf

Instead of creating an adversarial solution, this research group proposed a novel dual-generator training approach. These generators share a joint latent space, and one is kept frozen for the duration of training. The other generator has to shift each generated instance, individually, along some textually prescribed path in CLIP’s embedding space while having context from the source domain across an infinite range of instances. The researchers introduce a novel adaptive training approach to increase stability when the domain changes drastically. They train only on relevant layers and restrict it for specific iterations with their CLIP method.

The research group showed that their method works for artistic styles, cross-species identity transfer, and shape changes. They compared it to other editing techniques without training data and demonstrated how the proposed approach is more effective than them all.

Paper: https://arxiv.org/pdf/2108.00946.pdf

Project: https://stylegan-nada.github.io/

Code: https://github.com/rinongal/StyleGAN-nada