Tencent AI Research Unveils ‘PIRenderer’, An AI Model To Control The Generation Of Faces Via Semantic Neural Rendering

Paper: https://arxiv.org/pdf/2109.08379.pdf

Portrait images are an essential type of photograph that can be found in everyday life. The ability to intuitively control the poses and expressions of given faces in virtual reality or on film will be an essential task with applications ranging from filmmaking, communication designs for next-generation interfaces. But such editings are very challenging since it requires the algorithm to perceive reliable 3D geometric shapes of a given face. The human visual system is particularly acute towards portrait images, which poses an additional challenge for the algorithm. The task requires photo-realistic faces and backgrounds that make it even harder to create such content using current technology. 

Researchers from Peking University and Tencent Propose a neural rendering model ‘PIRenderer’ through their research paper. The proposed model can generate photo-realistic results with accurate motions given an input source portrait image and target 3DMM parameters.

Source: https://arxiv.org/pdf/2109.08379.pdf

The ‘PIRenderer’ model consists of three parts: the Mapping Network, the Warping Network, and the Editing Network. The mapping network produces latent vectors from motion descriptors. The warping network estimates the deformations between sources and desired targets. Based on these, it generates coarse results by warping the source with an estimated deformation. The editing network generates the final images from the coarse images. 

The superiority and versatility of the PIRenderer’ model are demonstrated in the research paper. The research group’s proposed model enables intuitive image control where you can edit target images with user-specified motions and generates realistic results in indirect portrait editing. Furthermore, they also show the potential of their model as an efficient neural renderer by further extending it to tackle audio-driven facial reenactment tasks. Experiments in the paper show that this model generates various and vivid motions from an audio stream, which are transferred into videos of arbitrary people.

https://arxiv.org/pdf/2109.08379.pdf (Audio-Driven Facial Reenactment)

Key Takeaways

  • The research group proposed a portrait image generation model PIRenderer, which enables the intuitive photo-real editing of facial expressions and head rotations.
  • The ‘PIRenderer’ model was used to tackle the indirect image editing task, which requires imitating other individuals. Because of the disentangled modifications and efficient neural renderer, the research group extracted subject-agnostic motions and generated realistic videos.
  • The proposed model is an efficient face renderer that generates various and vivid videos from just one portrait image and the driving audio stream.

Paper: https://arxiv.org/pdf/2109.08379.pdf

Code: https://github.com/RenYurui/PIRender#Get-Start