Nvidia’s New Technique — Called Adaptive Discriminator Augmentation (ADA) — Allows Researchers To Train AI Models Using Limited Datasets

Nvidia introduces a new method to train AI models using limited data sets. Using minimal study material required for a general GAN, it can now learn complex skills, be it recreating images of cancer tissue or emulating famous painters.

The researchers at Nvidia have reimagined artwork based on less than 1,500 images from the Metropolitan Museum of Art. It was made possible by adopting a unique neural network training technique to the StyleGAN2 model.  

StyleGAN2 is Nvidia’s open-source GAN that consists of two cooperating networks, a generator for creating synthetic images and a discriminator that learns what realistic photos should look like based on the training data set. StyleGAN2 has trained various AI models such as GauGAN – an AI painting app, GameGAN – a game engine mimicker, and GANimal – a pet photo transformer.

Adaptive discriminator augmentation (ADA) is a technique that reduces the number of training images by 10 to 20 times and still generates excellent outcomes. This method can significantly contribute to the healthcare field by creating cancer histology images to train additional AI models.


Problems associated with training GAN’s on a limited dataset:

Most GAN’s follow a fundamental principle that an efficient AI model is created by using more training data. The discriminator coaches the generator by providing it pixel-by-pixel feedback to help make the synthetic images more realistic. But the discriminator can’t help the generator realize its full potential if training data are limited. 

Training a high-quality GAN takes typically 50,000 to 100,000 training images. However, it might not be possible to collect those many sample images on hand in some situations. Many GANs would fail to produce realistic results with limited training data. This problem is called overfitting. It occurs when the discriminator retains the training data and fails to provide useful feedback to the generator.

To resolve the overfitting issue in image classification, researchers use data augmentation. This technique uses distorted copies of existing images to expand smaller datasets, pushing the model to generalize better. However, earlier attempts of applying augmentation to GAN training images resulted in a generator that mimicked those distortions instead of creating credible synthetic images.

NVIDIA’s ADA method can flexibly apply Data Augmentations 

Nvidia’s ADA method resolves this issue by adjusting the distortion amount at different points in the training process. With the ADA method, the StyleGAN2 neural network learned from fewer training images giving excellent results.

Artists have used different editions of StyleGAN to create remarkable exhibits and produced a new manga on the famous style of illustrator Osamu Tezuka. Adobe has also adopted it to power Photoshop’s new AI tool, Neural Filters.

At present, there are considerably fewer medical images of several rare diseases in healthcare, including cancers. So it’s challenging to train an AI that could help physicians detect these rare diseases. The researchers can now apply GANs, where extracting data is time-consuming and challenging to obtain.  

Synthetic images can be created with a GAN using ADA, generating training data for another AI model that spot rare conditions on pathology images or MRI studies. It also makes it easier for healthcare institutes to easily share AI-generated data, as there are no patient data or privacy concerns.

Paper: https://arxiv.org/pdf/2006.06676.pdf

Github: https://github.com/NVlabs/stylegan2-ada

Source: https://blogs.nvidia.com/blog/2020/12/07/neurips-research-limited-data-gan