This AI Research Proposes SMPLer-X: A Generalist Foundation Model for 3D/4D Human Motion Capture from Monocular Inputs

The animation, gaming, and fashion sectors may all benefit from the cutting-edge field of expressive human pose and shape estimation (EHPS) from monocular photos or videos. To accurately portray the complex human anatomy, face, and hands, this job often uses parametric human models (like SMPL-X). Recent years have seen an influx of unique datasets, giving the community additional opportunities to research topics like capture environment, position distribution, body visibility, and camera viewpoints. However, the state-of-the-art approaches are still constrained to a small number of these datasets, causing a performance bottleneck in various scenarios and impeding generalization to uncharted terrain. 

To build reliable, globally applicable models for EHPS, their goal in this work is to analyze the available data sets thoroughly. To do this, they created the first systematic benchmark for EHPS using 32 datasets and assessed their performance against four key standards. This demonstrates the significant inconsistencies between benchmarks, highlighting the complexity of the overall EHPS landscape, and calls for data scaling to address the domain gaps between scenarios. This in-depth analysis highlights the necessity to reevaluate the use of existing datasets for EHPS, arguing for a switch to more aggressive substitutes that provide better generalization abilities. 

Their research emphasizes the value of utilizing several datasets to benefit from their complimentary nature. They also thoroughly look at the relevant aspects affecting these datasets’ transferability. Their research provides helpful advice for future dataset gathering: 1) Datasets do not need to be particularly huge to be beneficial as long as they contain more than 100K instances, according to their observation. 2) If an in-the-wild (including outdoor) collection is not feasible, various interior sceneries are an excellent alternative. 3) Synthetic datasets are becoming surprisingly more effective while having detectable domain gaps. 4) In the absence of SMPL-X annotations, pseudo-SMPL-X labels are helpful.

Using the information from the benchmark, researchers from Nanyang Technological University, SenseTime Research, Shanghai AI Laboratory, The University of Tokyo and the International Digital Economy Academy (IDEA) created SMPLer-X. This generalist foundation model is trained using a variety of datasets and provides remarkably balanced outcomes in various circumstances. This work demonstrates the power of massively chosen data. They developed SMPLer-X with a minimalist design philosophy to dissociate from algorithmic research works: SMPLer-X has a very basic architecture with only the most crucial components for EHPS. In contrast to a rigorous analysis of the algorithmic element, SMPLer-X is intended to permit huge data and parameter scaling and serve as a basis for future field research. 

A comprehensive model that outperforms all benchmark results from experiments with various data combinations and model sizes and challenges the widespread practice of restricted dataset training. The mean primary errors on five major benchmarks (AGORA, UBody, EgoBody, 3DPW, and EHF) were reduced from over 110 mm to below 70 mm by their foundation models, which also show impressive generalization capabilities by successfully adapting to new scenarios like RenBody and ARCTIC. Additionally, they demonstrate the effectiveness of optimizing their generalist foundation models to develop into domain-specific experts, producing exceptional performance across the board. 

They specifically employ the same data selection methodology that enables their specialized models to achieve SOTA performance on EgoBody, UBody, and EHF in addition to becoming the first model to attain 107.2mm in NMVE (an 11.0% improvement) and break new records on the AGORA leaderboard. They provide three distinct contributions. 1) Using extensive EHPS datasets, they construct the first systematic benchmark, which offers crucial direction for scaling up the training data towards reliable and transportable EHPS. 2) They investigate both data and model scaling to construct a generalist foundation model that offers balanced outcomes across many scenarios and extends effectively to unexplored datasets. 3) They refine their foundation model to make it a powerful specialist across various benchmarks by extending the data selection technique.


Check out the Paper, Project Page, and GithubAll Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...