The goal of blind face restoration is to recover high-quality images of human faces from their low-quality counterparts that have been degraded for an unknown reason. Some degradation causes could be noise, blur, low-resolution, and compression artifacts. In this work, researchers from the Applied Research Center of the Tencent company propose GFP-GAN, a Generative Facial Prior GAN for real-world blind face restoration. As it is possible to see in Figure 1, the images restored through GFP-GAN reach higher realness and fidelity with fewer artifacts.
Figure 2 shows the overall architecture of GFP-GAN. Given a degraded input image, the goal of GFP-GAN is to generate a high-quality image that is most likely similar to the non-degraded ground-truth image. Overall, GFP-GAN includes a degradation removal module and a pre-trained face GAN like StyleGAN2. We will describe these modules in the next paragraphs.
Degradation removal module
In this paper, the authors used the U-Net structure as a degradation removal module. The degradation removal module is in charge of removing degradation from the input image and extracting “clean” features (named Flatent and Fspatial) that will be later used by StyleGAN2. In order to have intermediate supervision during the degradation removal process, the authors rely on a restoration loss. Specifically, GFP-GAN outputs an image at each resolution scale of the UNet decoder. This image is then forced to be close to the corresponding ground-truth image with the same resolution scale.
Generative Facial Prior and Latent Code Mapping
The intuition of this work is that, since a pre-trained face GAN captures a distribution over human faces, it is possible to use it to enhance the restoration of degraded images. Typically, it is possible to map the input image to its closest latent code in the latent space of the pre-trained GAN and then use the same GAN to generate the corresponding output. However, such a solution usually requires a time-consuming iterative optimization to reach satisfying results. For this reason, the authors of this paper decided to generate intermediate features FGAN of the closest face, modulated through the latent features Flatent of the degradation removal module to increase the fidelity with the input image. In particular, Flatent are firstly mapped to intermediate latent codes W through a multi-layer perceptron (MLP). Then, the latent codes W pass through each convolutional layer of the pre-trained GAN to generate GAN features at each resolution scale. These features are essential since they provide several facial details captured by the weights of the pre-trained GAN. They also incorporate information about the face’s colors that are helpful for color enhancement, including the colorization of old black-and-white photos.
Channel-Split Spatial Feature Transform
Finally, to better preserve fidelity, the spatial features Fspatial produced by the degradation removal module are used to spatially modulate the GAN features FGAN. In this way, it is possible to preserve spatial information from the input image. To achieve this objective, GFP-GAN relies on a Spatial Feature Transform (SFT) to apply affine transformations (e.g., scaling, shifting) to FGAN. Moreover, considering the trade-off between realness and fidelity, the authors actually propose the use of Channel-Split Spatial Feature Transform (CS-SFT) layers which perform spatial modulation through the spatial feature Fspatial on some GAN features to enhance fidelity while leaving unchanged the other GAN features to preserve realness. CS-SFT layers are used at each resolution scale before the final generation of the restored face.
The loss function used during the learning process combines four losses.
The reconstruction loss forces the output to be as close as possible to the ground truth.
The adversarial loss is used to force GFP-GAN to generate as natural as possible images. To achieve this goal, a Discriminator model must be required to distinguish authentic images from those generated by GFP-GAN. Thanks to this loss, GFP-GAN learns to generate images so that the Discriminator model cannot understand if they are real or fake.
The goal of facial component loss is to enhance the levels of details in some parts of the face: the left eye, right eye, and mouth. In particular, the authors considered local discriminators for each of these three regions that have to distinguish whether the restored patches are authentic or not. This forces the patches to be closer to the corresponding natural facial components.
Finally, the authors considered the identity preserving loss. A pre-trained face recognition model (i.e., ArcFace) is used to capture essential features for identity discrimination. The identity-preserving loss forces the restored image to be close to the ground truth, considering the feature space of ArcFace.
This Article is written as a research summary article by Marktechpost Staff based on the research paper 'Towards Real-World Blind Face Restoration with Generative Facial Prior'. All Credit For This Research Goes To Researchers on This Project. Check out the paper and github link. Please Don't Forget To Join Our ML Subreddit
Luca is Ph.D. student at the Department of Computer Science of the University of Milan. His interests are Machine Learning, Data Analysis, IoT, Mobile Programming, and Indoor Positioning. His research currently focuses on Pervasive Computing, Context-awareness, Explainable AI, and Human Activity Recognition in smart environments.