This Article Is Based On The Research Paper 'Few-Shot Head Swapping in the Wild'. All Credit For This Research Goes To The Researchers 👏👏👏 Please Don't Forget To Join Our ML Subreddit
Deepfakes are fake media in which a person’s likeness is replaced with someone else’s in an existing photograph or video. While faking information is not new, deepfakes use advanced machine learning and artificial intelligence techniques to modify or synthesize visual and auditory content with a high potential for deception. Not only facial characteristics but head forms and hairstyles have a significant impact on human identity perception.
The head swapping task tries to place a source head onto a target body flawlessly, which is critical in various entertainment contexts.
While face switching has gotten a lot of attention, not many have studied head swapping so far, especially in the few-shot environment. Many papers have attempted to apply face-swapping techniques to the head swapping task. But, head swapping is an intrinsically difficult task because of its particular requirements:
- Capturing the structural information of a whole head and non-rigid hair is required not only for precise facial identification and expression modeling but also for head swapping. As a result, earlier face-swapping identity extraction algorithms cannot be directly applied to head switching.
- Changing head forms and hairstyles would result in a significant region mismatch between switched head edges and backgrounds.
- It is important to manage the color disparity between the source and target skins.
Researchers from Baidu and the South China University of Technology take this forward by developing the Head Swapper (HeSer), which can shift the entire head, not just the face. The key goal was to align the source head with the destination in a single blender that could cleanly handle both color and background mismatches.
The Head Swapper (HeSer) uses two delicately built modules to perform few-shot head shifting. The team first created a Head2Head Aligner to holistically move position and expression information from the target to the source head by reviewing multi-scale data. It comprehensively aligns the source head to the same position and expression as the target image. By incorporating multi-scale local and global information from both photos, a style-based generator significantly balances identity, expression, and pose information. Furthermore, subject-specific fine-tuning could increase identity preservation and posture consistency even further.
Next, they introduce a Head2Scene Blender to address the issues of skin color fluctuations and head-background mismatches in the swapping method by simultaneously modifying face skin color and filling mismatched spaces on the background around the head.
The researchers have used a Semantic-Guided Color Reference Creation method and a Blending UNet in particular to accomplish seamless blending. They tested their model and found that it achieves better head switching results in various settings.
Recent generative models have significantly impacted identity safety, image authenticity, and other areas. Therefore, the team plans to share HeSer’s findings with the face/head forgery detection community in order to promote the growth of AI technology.