Northeastern University and Microsoft Researchers Propose A Novel Two-Branch Technique That Expands StyleGAN’s Latent Space

This Article Is Based On The Research Article 'Expanding the Latent Space of StyleGAN for Real Face Editing'. All Credit For This Research Goes To The Researchers Of This Paper 👏👏👏

Please Don't Forget To Join Our ML Subreddit

Generative adversarial networks (GANs) have been widely utilized for image synthesis. In 2019, NVIDIA open-sourced Style Generative Adversarial Network, or StyleGAN for short, is a GAN architecture extension that modifies the generator model significantly. StyleGAN has capabilities to create the simulated image in steps, starting with a low resolution and increasing to a high resolution. The architecture achieves this by evaluating the visual features that are manifested in that level, from standard aspects (position, face shape) to minute details (hair color), without affecting other levels, by adjusting the input of each level individually.

Many other studies have introduced a powerful generative network with semantic image editing abilities to change a person’s age, expression, gender, and other characteristics in high-quality photographs. Performing such adjustments on real-life face photos, on the other hand, poses a number of difficulties. According to researchers, determining StyleGAN latent variables that will best maintain these traits to produce realistic modifications is challenging because the input images frequently involve out-of-distribution identities, haircuts, lighting circumstances, etc. Furthermore, previous research has revealed entanglement issues in which changing one facial feature affects other facial features. 

To solve these challenges, Northeastern University and Microsoft research team propose a novel two-branch technique that increases the latent space of StyleGAN. The method described in their recent paper,” Expanding the Latent Space of StyleGAN for Real Face Editing,” enables identity-preserving and disentangled-attribute editing for actual face photos. Their findings show that the method outperforms state-of-the-art algorithms in both qualitative and quantitative terms. 

By augmenting StyleGAN’s latent space with 2D content characteristics, such as using the sparsity restriction on style editing, the team achieves identity-preserving and disentangled-attribute editing for real face photos. This supplies the model with attribute-aware, image-specific data, which can help it perform better in real-world face editing. They also state that using the alignment loss and feature fusion module enables achieving local region editing by controlling the influence of style and content characteristics. 

The proposed framework’s first (style) branch, given an input face image and target attributes, handles the entanglement issue with sparse manipulation of one-dimensional style features. On the other hand, the second (content) branch alleviates the distortion issue and enhances the edited image with additional appearance details using two-dimensional content features. This configuration allows the framework to synthesize a modified image with target properties while maintaining appearance elements like identity, background, and lighting circumstances. 


On the FFHQ and CelebA-HQ face datasets, the team tested its method against various state-of-the-art inversion algorithms on real-world face editing and reconstruction tasks. The test results demonstrate that the proposed two-branch technique outperformed all other methods in terms of perceptual similarity, pixel-wise distance, peak signal-to-noise ratio, and structural similarity metrics on out-of-domain data.