The StyleGAN architecture is a great way to generate high-quality images, but it lacks the ability to control camera poses precisely. The recent NeRF based Generators have made progress towards creating real results so far as they can’t produce photorealistic images.
Researchers at Huawei and Shanghai Jiao Tong University have developed CIPS-3D, an approach that synthesizes each pixel value independently, just as its 2D version did.
The proposed generator consists of a shallow 3D NeRF network simplified to alleviate memory complexity and has the capacity for deep 2D INR (implicit neural representation) networks without any spatial convolution or up-sampling operations. The proposed generator’s design is consistent with the well-known semantic hierarchical principle of GANs, where early layers ((i.e., the shallow NeRF network in the generator) determine pose and middle/high ((i.e., the INR network in the generator) control color scheme. The early NeRF network enables the research team to control camera pose explicitly easily.
The CIPS-3D suffers from a mirror symmetry problem, which also exists in other 3D-aware GANs such as GIRAFFE and StyleNeRF. The research explained why this happens instead of simply attributing it to dataset bias. The research group solved this problem by adding an auxillary discriminator to the network. Partial gradient back propagation was proposed as a training strategy to train CIPS-3D at high resolution.
The researchers validated the advantages of CIPS-3D on high resolution face datasets, including FFHQ, MetFaces, BitmojiFaces, CartoonFaces, and an animal dataset AFHQ. The details can be found in the research paper and Github. The links are given below.