The Generative Adversarial Network (GAN) is a deep neural network architecture that may learn from training data and produce new data that shares the same properties as the training data. Generally, the GAN, which aims to transfer a pre-trained GAN to a given domain with limited training data, is composed of two complementary neural networks, the generator, and the discriminator. The generator learns to generate fake data, while the discriminator learns to distinguish the generator’s fake data from original examples. Thanks to excellent pre-trained GANs such as StyleGAN and BigGAN, GAN adaption has become a well-studied research topic.
Pre-trained GANs on large datasets are used to reduce the impact of the lack of data and hasten the learning process for a new domain. Three cases can be distinguished: few-shot, one-shot, and zero-shot.
Yet, there are limits in the existing one-shot settings since each exemplar provides extensive information more than texture style and general color. Previous works tend to transfer the artistic style while the entities are
neglected. Moreover, the entity (such as glasses), a crucial style component, must be transferred simultaneously with the color style. Additionally, the impact of huge entities on the color space makes it simple for earlier works to produce artifacts. To deal with those problems, a Chinese research team proposed dividing the adaptation from the source domain to the target domain into two parts: 1) Transferring global style elements like texture and color. 2) Generating new entities outside of the source domain.
Unlike previous works focusing on style transfer, the proposed new method addresses the generalized one-shot adaption task for both entity and style transfer. A reference image and its binary entity mask are used to perform this operation. The StyleGAN model pre-trained on FFHQ is used as the baseline. The authors introduced a target generator Gt formed by a generator Gt’ inherited from the source generator Gs and an auxiliary network (aux) trained from scratch. The aux is made to deal with entities, while Gt’ aims to concentrate on stylizing clear faces to leverage the prior knowledge stored in Gs.
The training process consists of three main parts:
1- Style fixation and exemplar reconstruction to obtain approximately the style of the examplar.
2- Internal distribution learning to minimize the distance of internal patch distributions between syntheses and exemplars for style and entity transfer.
3- Manifold Regularization to avoid content distortion during the training.
The overall cost function comprises four loss functions: reconstruction loss, style loss, entity loss, and the variational Laplacian regularization.
The evaluation of the introduced method was made through an experimental study. The authors employ the face alignment network to extract facial landmarks and compute the Normalized mean error (NME) to evaluate the cross-domain correlation objectively. Compared to state-of-the-art methods, the proposed approach gives the best scores in terms of NME. It achieves satisfying stability and gets similar NME in different fields. An ablation study was also carried out to demonstrate the contribution of each of the four loss functions.
This article presented a new GAN adaption network proposed for the generalized one-shot GAN adaption task. Despite the encouraging performance obtained, the authors confirm that some imitations exist because the proposed framework relies heavily on learning internal distribution. The most significant is its inability to properly manage the position of the identity, which might result in failures when the posture varies excessively. Another restriction is that the entity cannot be very complex, making patch distribution harder.
This Article is written as a research summary article by Marktechpost Staff based on the research paper 'Generalized One-shot Domain Adaption of Generative Adversarial Networks'. All Credit For This Research Goes To Researchers on This Project. Check out the paper and github. Please Don't Forget To Join Our ML Subreddit
Mahmoud is a PhD researcher in machine learning. He also holds a
bachelor's degree in physical science and a master's degree in
telecommunications and networking systems. His current areas of
research concern computer vision, stock market prediction and deep
learning. He produced several scientific articles about person re-
identification and the study of the robustness and stability of deep