Researchers Develop ‘Garment4D’: A Garment Reconstruction Model Using Point Cloud Sequences


Virtual try-on, virtual reality/augmented reality, and visual effects all use garment reconstruction. With the use of implicit representation or volumetric representation, extensive efforts have been made to rebuild the human body and clothes as a whole. In many situations, however, having a controllable garment model is desirable.

A group of academics focused on the parametric modeling of 3D clothes in a new study, which provides benefits in two ways. To begin with, clothing can be separated from the human body. Second, the topology of the reconstructed meshes may be modified, making it possible to execute downstream tasks that need high interpretability.

For numerous reasons, the team decided to tackle the challenge from the standpoint of 3D, namely point cloud sequences, rather than earlier parametric methods for garment reconstruction that require 2D RGB photos as input. 3D inputs reduce scale and posture ambiguities that are difficult to avoid when working with 2D images, and utilizing temporal information is critical for garment dynamics capture, which has received little attention. Furthermore, recent advancements in 3D sensors (e.g., LiDAR) have reduced the cost and difficulty of obtaining point clouds, making it easier to use 3D point clouds for research and commercial purposes.

Garment4D is a suggested garment reconstruction system with three primary components: a) sequential garment registration, b) canonical garment estimate, and c) posed garment reconstruction. They construct many sequences of garment meshes in the registration phase, each with a different mesh topology but sharing the same topology within one sequence. An optimization-based method is utilized to register one frame from each sequence to a template mesh for each type of garment (e.g., T-shirt, pants, skirt).

Then, within each sequence, a barycentric interpolation approach is employed to re-mesh other frames to match the template mesh’s topology. Following the practice of prior parametric techniques, a semantic-aware garment PCA coefficients encoder, which takes the dressed human point cloud sequences as input, calculates the canonical garment mesh for each sequence as the initial phase of garment reconstruction.


The complications arise in the posed garment reconstruction section for two reasons. To begin with, learning low-level geometric characteristics directly is difficult due to the disordered and unstructured nature of point clouds. Second, capturing garment dynamics induced by interactions between the human body and clothes is difficult. Modeling the non-rigid deformation of loose clothes (e.g., skirts), which is dependent on both current human position and historical human motions, is particularly problematic.

First, the Interpolated Linear Blend Skinning (LBS) is applied on the predicted canonical clothes as proposals to overcome the issues. Unlike those that rely on the SMPL+D model to skin the garment, the suggested Interpolated LBS can skin loose clothes without producing artifacts. Furthermore, the procedure does not necessitate any prior knowledge. The Proposal-Guided Hierarchical Feature Network is provided coupled with the Iterative Graph Convolution Network (GCN) for efficient geometric feature learning after getting the proposals. Meanwhile, to capture smooth garment dynamics, a Temporal Transformer is used for temporal fusion.

CLOTH3D was modified for the studies to create a point cloud sequence-based garment reconstruction dataset. CLOTH3D is a large-scale synthetic dataset featuring a vast number of human posture sequences and a variety of clothing forms and designs. The point cloud sequence inputs are created by sampling point sets from 3D human models. For the experiments, three types of clothes were chosen: skirts, t-shirts, and trousers. At an 8:2 ratio, the sequences are divided into training and testing sets.

The team also ran tests on a real human scan dataset called CAPE, in addition to the synthetic dataset. CAPE is a large-scale real-clothed human scan collection of 150k 3D scans and 15 people. It is not practicable to train the network on CAPE from scratch because CAPE only releases scanned clothed human sequences with no separable garment meshes. As a result, CAPE is based on a network that has been pre-trained using CLOTH3D.

There is no easily available earlier work for direct comparison because this is the first study tackling garment restoration from point cloud sequences. As a result, the PointNet++ structure is modified to predict PCA coefficients directly from input point clouds for canonical garment estimation. The Multi-Garment Net (MGN) is used to compare point cloud inputs for posed garment reconstruction.

From the point cloud sequences, the estimated canonical skirt reconstructed the rough shape and length of the skirt. The interpolation LBS generates reasonable recommendations with smooth surfaces and no artifacts, as predicted. The LBS results include two flaws that make them appear artificial. The interpenetration of clothes and the human body is one example. The other is a lifted skirt, which is floating in mid-air and not touching the leg, and which is supposed to be the cause of the lift. Because the skirt is not homotopic with the human body, these two defects are unavoidable. These flaws would eventually be corrected.


Garment4D’s capacity to handle both loose and tight clothing driven by substantial body motions is demonstrated by the outcomes. Garment4D can capture not only the exact shape of apparel that is homotopic to the human body, such as T-shirts and trousers, but also the dynamics induced by body movement. Furthermore, more of the ankle or forearm is exposed when the leg or arm bends in order to keep the length of the trouser legs or sleeves. Both long and short skirts can be faithfully recovered for the reconstruction of skirts that are not homotopic to the human body. When the two legs are stretched to the sides, the short skirt is tighter, and the natural distortion can be seen. The proposed framework’s usefulness is demonstrated by qualitative and quantitative results from extensive experiments.



Project Page: