The recent advances in the quality and resolution of Generative adversarial networks (GAN) have seen a rapid improvement. These techniques are used for various applications, including image editing, domain translation, or video generation, to name just some examples. While several ways to control GANs’ generative process have been found, there is still not much known about their synthesis abilities.
In 2019, Nvidia launched its second version of StyleGAN by fixing artifacts features and further improving generated images’ quality. StyleGAN being the first of its type image generation method to generate very real images was open-sourced in February 2019. According to the research paper, in StyleGAN2, several methods and characteristics were improved, and changes in model architecture and training methods were addressed.
But it was still needed to eliminate the unwanted side information and stop the network from using it. While borders can be solved by simply operating on slightly larger images, aliasing is much harder. To solve the problem of aliasing, researchers at Nvidia looked to bandlimited functions on a continuous domain and switched their focus from solving this issue in the classical Shannon-Nyquist signal processing framework. To eliminate all sources of positional references, it is necessary that the details can be generated equally well regardless of pixel coordinates. This implies a continuous equivariance with sub-pixel translation and optionally rotation for each layer in our model, which will achieve perfect detail control.
To update the model with the above needs, the NVIDIA research team did a comprehensive overhaul of all signal processing aspects of the StyleGAN2 generator, which they later called the new model ‘StyleGAN3‘. Their contributions include the surprising finding that current upsampling filters are not aggressive enough in suppressing aliasing, and they present a principled solution to this issue by considering pointwise nonlinearities. They also show how a model based on 1×1 convolutions yields a strong, rotation equivariant generator after an overhaul.
Aliasing is suppressed to force the model into more natural hierarchical refinement, and its mode of operation changes drastically. This involves correctly attaching details for underlying surfaces. This new advancement promises significant improvements to models that generate video and animation. The new generator (StyleGAN3) is more computationally intensive than StyleGAN2 but still enjoys a high FID.
New Feature Improvements are as follows:
- Alias-free generator architecture and training configurations (
- Tools for interactive visualization (
visualizer.py), spectral analysis (
avg_spectra.py), and video generation (
- Equivariance metrics (
- General improvements: reduced memory usage, slightly faster training, bug fixes.