Researchers from Imperial College London Propose FitMe: An AI Model that Turns Arbitrary Facial Images to Relightable Facial Avatars, Directly Usable in Common Gaming and Rendering Engines

Despite the enormous advancements in the previous ten years, 3D facial reconstruction from a single unconstrained image remains a significant research issue with a vibrant computer vision community. Its uses are now numerous and diverse, including but not limited to human digitization for applications in virtual and augmented reality, social media and gaming, the generation of synthetic datasets, and health applications. Recent studies, however, frequently need to produce components that may be utilized for photorealistic rendering and fall short of precisely recreating the identities of various people. 

3D Morphable Models (3DMM) are a popular method for obtaining face form and appearance from a single “in-the-wild” shot. This can be attributed to several factors, including the need for comprehensive datasets of scanned human geometry and reflectance, the limited and muddled information found in a single facial image, and the limitations of current statistical and machine-learning methods. To model face shape and appearance with variable identity and expression, which were learned from over 200 participants, Principal Component Analysis (PCA) was used in the initial 3DMM study. 

Since then, more complex models comprising thousands of individuals, such as the LSFM, Basel Face Model, and Facescape, have been developed. Additionally, 3DMMs of entire human heads or other facial features, including the ears and tongue, have been developed recently. Finally, subsequent publications have included expansions that range from directly regressing 3DMM parameters to non-linear models. Such models, however, are unable to create textures with photorealistic realism. Deep generative models have witnessed significant advancements during the past ten years. Progressive GAN architectures, in particular, have produced outstanding results in learning distributions of high-resolution 2D photographs of human faces using Generative Adversarial Networks (GANs). 

Recently, meaningful latent regions that may be traversed to reconstruct and control various aspects of the produced samples have been learned using style-based progressive generative networks. Some techniques, like UV mapping, have also successfully acquired a 2D representation of 3D face features. To produce 2D facial pictures, rendering functions can use 3D facial models produced by 3DMMs. Iterative optimization also necessitates differentiating the rendering process. Recent developments in the photorealistic differentiable rendering of such assets are made possible by differentiable rasterization, photorealistic face shading, and rendering libraries. 

Unfortunately, the Lambertian shading model used in 3DMM works falls short of accurately representing the intricacy of face reflectance. The problem is that more than a single RGB texture is needed for lifelike facial representation, which calls for various facial reflectance factors. Although recent attempts have been made to simplify such settings, such datasets are few, tiny, and challenging to acquire. High-fidelity and relightable facial reflectance reconstructions have been made possible by several modern methods, including infrared ones. However, these reconstructions still need to be discovered. Furthermore, it has been demonstrated that strong models can capture facial looks using deep models but cannot display single or multiple picture reconstructions. 

In a contemporary alternative paradigm that relies on learned neural rendering, implicit representations capture avatar appearance and shape. Despite their excellent performance, standard renderers cannot employ such implicit representations and are typically not relightable. The most current Albedo Morphable Model (AlbedoMM) also uses a linear PCA model to record facial reflectance and shape. Still, the per-vertex colour and normal reconstruction are too low-resolution for photorealistic depiction. From a single “in-the-wild” photograph, AvatarMe++ can rebuild high-resolution texture maps of facial reflectance. However, the three steps of the process—reconstruction, upsampling, and reflectance—cannot be directly optimized with the input image. 

Researchers from Imperial College London introduce FitMe which is a fully renderable 3DMM that can be fitted on free facial pictures using precise differentiable renderings based on high-resolution face reflectance texture maps. FitMe establishes identity similarity and produces highly realistic, fully renderable reconstructions that may be used immediately by rendering programs that are available off the shelf. The texture model is built as a multimodal style-based progressive generator that simultaneously creates the face’s surface normals, specular albedo, and diffuse albedo. A painstakingly crafted branching discriminator allows easy training with various statistics modalities. 

They optimize AvatarMe++ on the publicly available MimicMe dataset to build a capture quality face reflectance dataset of 5k people, which they further modify to balance skin-tone representation. A face and a head PCA model, trained on sizable geometry datasets, are used interchangeably for the form. They create a style-based generator projection and 3DMM fitting-based single- or multi-image fitting approach. The rendering function must be differentiable and quick to do effective iterative fitting (in less than one minute), rendering models like path tracing useless. Prior research has relied on slower optimization or simpler shading models (such as Lambertian). 

They improve on previous work by adding shading that is more lifelike in appearance and has convincing diffuse and specular rendering that can acquire form and reflectance for photorealistic rendering in common rendering engines (Fig. 1). FitMe can rebuild high-fidelity facial reflectance and achieve remarkable identity similarity while precisely capturing features in diffuse, specular albedo, and normals because to the flexibility of the generator’s expanded latent space and the photorealistic fitting. 

Figure 1: FitMe uses a reflectance model and differentiable rendering to reconstruct relightable form and reflectance maps for facial avatars from a single (left) or several (right) unconstrained face pictures. In typical engines, the findings can be displayed in photorealistic detail.

Overall, in this work, they present the following: 

• The first 3DMM capable of producing high-resolution facial reflectance and shape, with an increasing level of detail, that can be rendered in a photorealistic manner

• A technique to acquire and augment

•The first branched multimodal style-based progressive generator of high-resolution 3D facial assets (diffuse albedo, specular albedo, and normals), as well as a suitable multimodal branched discriminator

Check Out The Paper and Project Page. Don’t forget to join our 22k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at

🚀 Check Out 100’s AI Tools in AI Tools Club

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...