Researchers From Princeton and Adobe Propose 3D-FM GAN: A Novel Conditional GAN Framework Designed Specifically For 3D-Controllable Face Manipulation

The development of 3D-controllable portrait synthesis has been significant. However, it is still difficult to precisely control existing face photos in 3D. Even though it is a simple approach, concatenating GAN inversion and a 3D-aware, noise-to-image GAN is ineffective and could result in a glaring decline in editing quality. To close this gap, a team of researchers from Princeton and Adobe introduced 3D-FM GAN, a novel conditional GAN framework created explicitly for 3D-controllable face manipulation. This framework doesn’t need to be tuned after the end-to-end learning phase. Their image generator offers high-quality, identity-preserving, 3D-controllable face alteration by meticulously encoding the input face image and a physically-based representation of 3D edits into StyleGAN’s latent spaces.

A team from Adobe Research presents 3D-FM GAN. This novel conditional GAN framework enables precise 3D-controllable face manipulation with high photorealism and strong identity preservation without requiring manual tuning or optimizations. The newly published paper 3D-FM GAN: Towards 3D-Controllable Face Manipulation.


Following is a summary of the group’s significant contributions:

  • Researchers suggest 3D-FM GAN, a revolutionary conditional GAN framework created exclusively for accurate, explicit, excellent, 3D-controllable face manipulation.
  • The researcher designed two crucial training procedures to effectively learn the model: reconstruction and disentangled training. Additionally, they thoroughly investigate StyleGAN’s latent spaces for structural design, resulting in a novel multiplicative co-modulation architecture with a significant identity-editability trade-off.
  • Numerous quantitative and qualitative analyses show how superior their solution is to existing techniques. Additionally, their model exhibits excellent generalizability to modify artistic faces beyond the training area.

The 3D-FM GAN architecture may provide photorealistic disentangled editing on head posture, facial expression, and scene lighting features while keeping a strong face identity given a detailed input facial image.


The face reconstruction networks of the 3D-FM GAN framework forecast the 3D coefficients of the input image, and a physically-based renderer embeds the necessary facial expression, lighting, and position modifications. A StyleGAN conditional generator then creates the altered face from the original image and the rendered version of the edited face.

To maintain a distinct facial identity and support 3D editing, the team also created two crucial training strategies: reconstruction and disentangled training. The multiplicative co-modulation architecture resulting from their structural design, which uses the StyleGAN latent space, achieves a favorable identity-editability trade-off.

To assess the identity retention and editing controllability of the 3D-FM GAN framework as well as the photorealism of its alterations, the team’s empirical study used 5k images from the Flickr-Faces-HQ (FFHQ) face image collection.

The suggested 3D-FM GAN performed better than other approaches in the studies, showing superior editability, identity retention, and photorealism. The group notes that 3D-FM GAN’s generalizability on extensive pose manipulation and creative images created outside of a specific domain has improved.

This Article is written as a research summary article by Marktechpost Staff based on the research paper '3D-FM GAN: Towards 3D-Controllable Face Manipulation'. All Credit For This Research Goes To Researchers on This Project. Check out the paper and project page.

Please Don't Forget To Join Our ML Subreddit
🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...