3D-aware image synthesis has made rapid progress, but two problems remain. First, existing approaches can lack an underlying 3D representation or rely on view inconsistent rendering to synthesize images that are not multi-view consistent. Second, they may depend upon network architectures that do not produce high-quality results because the methods used for generation limit their expressiveness and ability to create realistic content.
A research group from Stanford University introduce a generative adversarial approach to unsupervised 3D representation learning, called Periodic Implicit Generative Adversarial Networks (π-GAN or pi-GAN). This method is superior to other existing methods and outperforms them in both quality and speed. The π-GAN conditions a latent radiance field which is represented by the SIREN network, a fully connected neural network with periodic activation functions. The conditioned radiance field maps a 3D location and 2D viewing direction with view-dependent radiance and view-independent volume density. it is possible to render the radiance field from arbitrary camera poses using a differentiable volume rendering method that relies on classical volume rendering techniques
The π-GAN 3D image synthesis approach is much more effective than previous ones at providing multi-view consistency and rendering from different camera angles. The proposed method utilizes a SIREN neural radiance field representation, encouraging the generation of vivid, realistic images that are easy to interpret in terms of their depth. SIREN implicit scene representation uses a periodic activation function. This makes SIREN implicit scene representation more capable than ReLU representations at representing fine details, which enables π-GAN to render sharper images in comparison with previous works.
In this research, apart from π-GAN, the research group also brought two other technical contributions.
- The researchers observe that existing research has conditioned ReLU-based radiance fields via concatenation of the input noise to one or more layers. At the same time, conditioning-by-concatenation is suboptimal for implicit neural representations with period activations (SIRENs). Instead, they propose using a mapping network to condition SIREN layers by feature wise linear modulation (FiLM), generally applied.
- To offset the increased computational complexity of 3D GANs, the researchers introduced a progressive, growing strategy inspired by past success experience in 2D convolutional GANs.
The researchers were able to obtain state-of-the-art 3D-aware image synthesis results on real-world and synthetic datasets. They also demonstrated that their method generalizes to new viewpoints and has applications for view generation tasks such as novel scene reconstruction or synthesizing a particular viewpoint from an existing collection of images.
- Introduction of the SIREN-based implicit GANs as a viable alternative to convolution GAN architectures.
- The researchers propose a mapping network including FiLM conditioning and progressive, growing discriminator as principle components to achieve best results with their novel SIREN-based implicit GAN.
- The researchers achieved state-of-the-art results on 3D-aware image synthesis from unsupervised 2D data. This was done using the CelebA, Cats, and CARLA datasets.