Researchers Propose ‘Projected-GANs’, To Improve Image Quality, Sample Efficiency, And Convergence Speed

Generative Adversarial Networks (GANs) are a novel approach to Generative Modeling using deep learning methods, such as convolutional neural networks (1). A Generative Adversarial Network (GAN) has two components: one called the generator and another known as a discriminator. In image synthesis, the generator’s task is to generate an RGB image while the discriminator aims to distinguish real from fake samples. Although GANs produce high-quality images, they are challenging to train. GANs need lots of computing power and regularization. It is also essential to go through different types of hyper-parameters.

Researchers from the University of TĂĽbingen, Max Planck Institute for Intelligent Systems, and Heidelberg have studied ways to improve GAN training by using pre-trained representations. The researchers proposed a more effective strategy (Projected-GAN) that combines features across channels and resolutions.

The use of pre-trained representations has become a standard technique in computer vision and natural language processing. Pre-trained perceptual networks are combined with GANs for image-to-image translation and have brought great results. Still, this idea has not yet materialized in unconditional noise to image synthesis.

The researchers explain that when considering the naive application, this idea does not lead to state-of-the-art results as robust pre-trained features enable the discriminator’s ability to dominate two-player games resulting in a vanishing gradient for the generator. During the research, the researchers have shown that there are two critical components needed to exploit the full potential of pre-trained perceptual feature spaces, which include feature pyramids (it enables multi-scale feedback with multiple discriminators) and random projections (it utilize deeper layers of the pre-trained network in a better way).

The proposed strategy (Projected-GANs) can significantly reduce the amount of data that needs to be processed, which results in a more efficient system and avoids expensive hyperparameter sweeps. The researchers analyzed small and large datasets with resolutions up to 10242 pixels. The researchers suggest that they can produce state-of-the-art image synthesis results using significantly less training time. 

When compared with the state-of-the-art models, they considered these three sections/metrics (i) Convergence Speed and Data Efficiency, (ii) Large Datasets, and (iii) Small Datasets. The research with Projected-GAN achieved low FID (Fréchet Inception Distance) on all datasets.


The research team has open-sourced this proposed project, and you can read more about it in the research paper and use it through the below given Github codes.