While recent attempts to solve the problem of head reenactment using a single reference image have shown promising results, most of them perform poorly in terms of photo-realism and fail at preserving identity. Researchers from Imperial College London, Huawei Technologies (UK), and the University of Sussex propose ‘HeadGAN‘, a novel one-shot GAN-based method for talking head animation and editing.
The research group took a different approach from most existing few-shot methods and used 3D face representations to condition synthesis. They benefit from prior knowledge of expression and identity disentanglement, enclosed within 3D Morphable Models (3DMMs).
Their research decision to model faces with 3DMMs allow HeadGAN to function as:
- A real-time reenactment system operating at ∼ 20 fps
- An efficient method for facial video compression and reconstruction
- A facial expression editing method
- A novel view synthesis system, including face frontalisation.
The researchers also condition the generative process on speech features, enabling them to perform accurate mouth synthesis. This was demonstrated in their automated lipreading experiment.
In the research paper, the proposed method was extensively compared with state-of-the-art methods. They found their model with superior image quality and performance in terms of standard GAN metrics. They also compared on the tasks of reconstruction, reenactment and frontalisation, even when trained on the larger VoxCeleb2 dataset.
The research paper explains their proposed ‘HeadGAN’ method, a novel one-shot method for animating heads driven by 3D facial data and audio features. The proposed framework also has superior reenactment performance and higher photorealism when compared with SOTA methods. The method has excellent application and could be used for reconstruction, pose and facial expression editing, as well as frontalisation.