Max Planck Institute and Facebook Reality Labs Develop A Model That Performs Human Re-Rendering From A Single Image

A team of researchers from the Max Planck Institute for Informatics and Facebook Reality Labs has developed an end to end trainable technique that performs human re-rendering from one image. It can incorporate its subjects in various user-defined poses with clothing being transferred from other reference images. Human re-rendering has numerous practical applications, from virtual reality, augmented reality to 3D videos. It is challenging to design algorithms that can render clothed humans in different poses from a single image. 

The model takes a single image of a clothed human as input; in the first step, the pipeline uses DensePose, an estimation aiming to map all human pixels in an RGB image to the 3D surface of the human body. DensePose predicts a dense correlation between the input image and a Skinned Multi-Person Linear (SMPL) model. Using SMPL parametric human surface models, the output images can be easily reposed to the target pose.

In the next step, a U-Net based network dubbed FeatureNet is deployed to ensure that the full UV feature map contains a D-dimensional feature representation. Further, it targets a specific pose as input rendering the full UV feature map to a d-dimensional feature image that matches the target pose. Finally, based on the Pix2PixHD model, a RenderNet generator network generates a realistic, rendered reposed image.

The experiments were carried out on the In-Shop Clothes Retrieval Benchmark of the DeepFashion dataset. Compared to other methods like Coordinate Based Inpainting (CBI), Variational U-Net(VUnet), etc., the proposed model delivered higher realism and better accuracy in preserving garment details. Although the model isn’t specifically designed, the researchers claimed that it can also generate realistic renderings for a video series that includes garment and motion transfer from the single source image.