Modern Virtual Reality applications require technology that supports photo-realistic human face rendering and restoration. Due to the social nature of people and their ability to read and convey emotions from minor changes in facial expressions, minute artifacts can cause the uncanny valley, which can be detrimental to the user experience. In order to solve challenging issues like innovative view synthesis and view-dependent effects modeling, several contemporary 3D telepresence techniques now make use of deep learning models and neural rendering.
These methods are typically data hungry, and the efficiency of those models is directly influenced by the architecture of the capturing device and data pipeline. In order to push the envelope in such photo-realistic human face models, a sizable dataset of high-quality, multi-view facial photos encompassing a wide range of expressions is necessary.
Such a dataset was presented by Meta researchers in recent work. It was gathered by a sophisticated multi-view capturing system (Mugsy) that the team created. The Codec-Avatar collection offers facial data of exceptional quality, variety in facial emotions, and quantity of camera views as compared to previous face modeling datasets. 13 participants were recorded by researchers using a wide range of high-fidelity facial expressions. The team had approximately 100 facial expressions for each individual that were simultaneously recorded at a resolution of about 11 megapixels by several machine vision cameras.
Three main obstacles must be overcome before data-driven avatars can be created: (1) unique perspective synthesis, which is difficult because researchers cannot put cameras everywhere, As they couldn’t ask the participants to act out every potential facial expression during the capture, they had to turn to (2) innovative expression synthesis and (3) relighting because it’s difficult to catch every possible lighting arrangement.
The researchers ran a number of tests on the dataset and discovered that it was crucial for a smaller number of training cameras to obtain an enhanced number of layers by using residual connections. It was discovered that the smaller the reconstruction error, the more training camera views there were accessible. In comparison to the reconstruction errors for the combined task of new view and expression synthesis, the reconstruction errors for a novel view and novel expression synthesis alone are much lower.
Large-scale multi-view codec-avatar datasets for neural face rendering, training, assessment, and visualization code, as well as pretrained models for 13 codec avatars, were all made available by Meta researchers. In addition to the dataset, researchers conducted an ablation study to explore how different model architectures react to extrapolating on unseen viewpoints and expressions. They found that the basic model benefitted from the addition of spatial bias, texture warp field, and residual connections. Researchers anticipate that this dataset will help advance facial reconstruction technology and aid the community in further VR telepresence research.
This Article is written as a research summary article by Marktechpost Staff based on the research paper 'Multiface: A Dataset for Neural Face Rendering'. All Credit For This Research Goes To Researchers on This Project. Checkout the paper and github link. Please Don't Forget To Join Our ML Subreddit
Nitish is a computer science undergraduate with keen interest in the field of deep learning. He has done various projects related to deep learning and closely follows the new advancements taking place in the field.