Meet TRACE: A New AI Approach for Accurate 3D Human Pose and Shape Estimation with Global Coordinate Tracking

Many areas can benefit from and use the recent advances in estimating 3D human pose and shape (HPS). However, most approaches only consider a single frame at a time, estimating human positions relative to the camera. Furthermore, these techniques do not follow individuals and cannot retrieve their worldwide travel paths. The problem is compounded in most hand-held videos since they are shot with a jittery, shaky camera. 

To solve these problems, researchers from the Harbin Institute of Technology, Explore Academy of, Max Planck Institute for Intelligent Systems, and implement novel end-to-end reasoning about persons in situations using a 5D representation (space, time, and identity). The proposed TRACE technique has various innovative architectural features. Most notably, it employs two novels, “Maps,” to reason about people’s 3D motion in time and space, both from the camera’s perspective and the world’s perspective. With the help of a second memory module, it is possible to keep tabs on individuals even after lengthy absences. TRACE recovers 3D human models in global coordinates from moving cameras in a single step and simultaneously tracks their movements. 

They had the objective of reconstructing each person’s global coordinates, 3D position, shape, identity, and motion simultaneously. To do this, TRACE first extracts temporal information before using a dedicated brain network to decode each sub-task. First, TRACE uses two parallel axes to encode the video and motion into separate feature maps, one for the temporal picture (F’i) and one for the motion (Oi). Using these features, the Detection and Tracking sub-trees execute multi-subject tracking to reconstruct the 3D human motion in camera coordinates.

The estimated 3D Motion Offset map shows the relative movement of each subject in space between two frames. An innovative memory unit extracts subject identities and constructs human trajectories in camera coordinates using estimated 3D detections and 3D motion offsets. The novel’s World branch then calculates a world motion map to estimate the subjects’ trajectories in global coordinates.

The absence of real-world data for training and evaluating global human trajectory estimates persists even with a robust 5D representation. However, compiling global human trajectory and camera postures for dynamic camera movies of natural environments (DC videos) is challenging. Therefore, the team simulated camera motions to transform wild films acquired by stationary cameras into DC videos and generate a new dataset called DynaCam.

The team tested TRACE using the DynaCam dataset and two multi-person in-the-wild benchmarks. When it comes to 3DPW, TRACE provides results that are SOTA. On MuPoTS-3D, TRACE achieves better results at tracking humans under long-term occlusion than earlier 3D-representation-based approaches and tracking-by-detection methods. Findings show that TRACE outperforms GLAMR on DynaCam when it comes to calculating the overall 3D trajectory of a human from DC videos.

The team suggests investigating explicit camera motion estimation using training data such as BEDLAM, which includes complicated human motion, 3D scenes, and camera motions in the future. 

Check Out The PaperCode, and Project. Don’t forget to join our 24k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at

🚀 Check Out 100’s AI Tools in AI Tools Club

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...