A novel and intriguing method of building virtual head models are neural head avatars. They learn the shape and appearance of talking humans in videos, skipping the difficult physics-based modeling of realistic human avatars. Over the past few years, techniques have been developed that enable the creation of realistic avatars from a single image. To construct the avatars in the one-shot mode, they use significant pretraining on the massive datasets of films of various people, utilizing general information about the human look.
The resolution of the training datasets greatly restricts the quality of this class of algorithms despite the excellent results they have produced. Since it must be both large-scale and diversified, i.e., include thousands of people with numerous frames per person, diverse demographics, lighting, background, face expression, and head attitude, this constraint cannot be simply overcome by gathering a higher resolution dataset. The resolution of every public dataset that satisfies these requirements is constrained. As a result, avatars are learned at resolutions as high as 512 x 512 by even the most modern one-shot avatar algorithms.
A new model for one-shot neural avatars that achieves state-of-the-art cross-reenactment quality as high as 512 x 512 resolution was recently put forth by Samsung researchers in a publication. The idea of representing avatar appearance as a latent 3D volume was used in the architecture, and researchers proposed a new method of fusing it with the latent motion representations. This method includes a novel contrastive loss that enables the system to achieve higher degrees of disentanglement between the latent motion and appearance representations. Additionally, the team developed a problem-specific gaze loss that improves the eye animation’s realism and precision.
The researchers also demonstrated how to use a second dataset of high-quality still photos to upgrade a model trained on medium-resolution films to megapixel resolution. As a consequence, the suggested strategy outperforms the standard super-resolution approach for the cross-reenactment task despite using the same training dataset. Thus, the team is the first to present neural head avatars in appropriate megapixel resolution.
The megapixel model is reduced into a ten times faster student model that works at 130 FPS on a current GPU by researchers because many practical applications for the production of human avatars require real-time or rendering that is quicker than real-time. The pupil has been taught for specific appearances, which allows for this great speedup. Additionally, applications built on such a student model that is restricted to predefined identities can stop it from being used to create deep fakes while also obtaining minimal rendering latency.
Samsung researchers recently unveiled a novel method for creating high-resolution neural avatars in a paper. This method speeds up rendering while maintaining render quality that is comparable to the full one-shot model. The characteristics of the training set are the cause of the system’s two main limitations. First, both of the training datasets utilized by the researchers exhibit a propensity for near frontal views, which lowers the rendering quality for significantly non-frontal head postures. Second, there is some temporal flicker in the findings because only static views are provided at high resolution. The team suggests addressing these problems in future work.
This Article is written as a research summary article by Marktechpost Staff based on the research paper 'MegaPortraits: One-shot Megapixel Neural Head Avatars'. All Credit For This Research Goes To Researchers on This Project. Checkout the paper and github. Please Don't Forget To Join Our ML Subreddit
Nitish is a computer science undergraduate with keen interest in the field of deep learning. He has done various projects related to deep learning and closely follows the new advancements taking place in the field.