We see digital avatars everywhere, from our favorite chat applications to virtual marketing assistants on our favorite e-commerce websites. They are becoming increasingly popular and integrating quickly into our daily lives. You go into your avatar editor, select skin color, eye shape, accessories, etc. and have one ready to mimic you in the digital world.
Constructing a digital avatar face manually and using it as a living emoji can be fun, but it only scratches the surface of what is possible. The true potential of digital avatars lies in the ability to become a clone of our entire body. This type of avatar has become an increasingly popular technology in video games and virtual reality (VR) applications.
Generating high-fidelity 3D avatars require expensive and specialized equipment. Therefore, we only see them used in a limited number of applications, like the professional actors we see in video games.
What if we could simplify this process? Imagine you could generate a high-fidelity 3D full-body avatar by just using some videos captured in the wild. No professional equipment, no complicated sensor setup to capture every tiny detail, just a camera and a simple recording with a smartphone. This breakthrough in avatar technology could revolutionize many applications in VR, robotics, video games, movies, sports, etc.
The time has arrived. We have a tool that can generate high-fidelity 3D avatars from videos captured in the wild. Time to meet Vid2Avatar.
Vid2Avatar learns 3D human avatars from in-the-wild videos. It does not need without need ground truth supervision, priors extracted from large datasets, or any external segmentation modules. You just give it a video of someone, and it will generate a robust 3D avatar for you.
Vid2Avatar has some smart tricks up its sleeves to achieve this. The first thing to do is to separate the human from the background in a scene and model it as a neural field. They solve the tasks of scene separation and surface reconstruction directly in 3D. They model two separate neural fields to learn both the human body and background implicitly. This is normally a challenging task because you need to associate the human body with 3D points without relying on 2D segmentation.
The human body is modeled using a single temporally consistent representation of the human shape and texture in canonical space. This representation is learned from deformed observations using an inverse mapping of a parametric body model. Moreover, Vid2Avatar uses an optimization algorithm to adjust multiple parameters related to the background, human subject, and their poses in order to best fit the available data from a sequence of images or video frames.
To further improve the separation, Vid2Avatar uses a special technique for representing the scene in 3D, where the human body is separated from the background in a way that makes it easier to analyze the motion and appearance of each separately. Also, it uses novel objectives, like focusing on having a clear boundary between the human body and the background, guiding the optimization process toward producing more accurate and detailed reconstructions of the scene.
Overall, a global optimization approach for robust and high-fidelity human body reconstruction is proposed. This method uses videos capture in-the-wild without requiring any further information. Carefully designed components achieve robust modeling, and in the end, we get 3D avatars that could be used in many applications.
Check out the Paper and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 15k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Ekrem Çetinkaya received his B.Sc. in 2018 and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. He wrote his M.Sc. thesis about image denoising using deep convolutional networks. He is currently pursuing a Ph.D. degree at the University of Klagenfurt, Austria, and working as a researcher on the ATHENA project. His research interests include deep learning, computer vision, and multimedia networking.