TensorFlow Team Introduce BlazePose GHUM Posture Estimation Model and Selfie Segmentation For Body Segmentation Using MediaPipe and TensorFlow.js

Image segmentation is a method used in computer vision to group pixels in an image into semantic areas, which is typically used to locate objects and boundaries. Body segmentation models do the same thing for a person and their twenty-four body parts. This technology can be used for a variety of purposes, including augmented reality, picture editing, and creative effects on photographs and movies, to name a few.

The TensorFlow team has recently released two new highly optimized body segmentation models that are accurate and quick as part of their improved body segmentation and posture APIs in TensorFlow.js.

The first model is the BlazePose GHUM posture estimation model, which includes segmentation support. This model is part of the unified pose-detection API, which can simultaneously do whole-body segmentation and 3D pose estimation. It’s ideal for capturing the feet and legs regions of bodies in full view and further away from the camera.

The second model is Selfie Segmentation, ideal for video calls where someone is standing right in front of a webcam (within 2 meters). This model is also a part of their unified body-segmentation API. The team mentions that while it may give a less accurate performance for the lower body, it provides significant accuracy across the upper body.

In 2019, the team launched the BodyPix model, which was cutting-edge at its release. The two new models provide much greater FPS and fidelity across devices for various use cases.

For the Selfie Segmentation model, the body-segmentation API supports two runtimes: the MediaPipe runtime and the TensorFlow.js runtime.

The notable humans in the frame are segmented using the BlazePose GHUM and MediaPipe Selfie Segmentation models. Both run in real-time on laptops and smartphones, but their intended applications differ. Selfie Segmentation focuses on selfie effects and conferencing for closeup situations (less than 2 meters). On the other hand, BlazePose GHUM specializes in full-body cases such as yoga, fitness, and dance and works up to 4 meters from the camera.

With humans, the Selfie Segmentation model predicts a binary foreground segmentation mask. From picture acquisition to neural network inference to rendering the segmented output on the screen, the pipeline is designed to execute entirely on GPU. It delivers maximum performance by avoiding slow CPU-GPU syncs. Background replacement is powered by variations of the model in Google Meet, and a more general model is now accessible in TensorFlow.js and MediaPipe.

In addition, the BlazePose GHUM model also includes a body segmentation mask. With this, users can have a single model predicting both outputs. It allows outputs to oversee and improve each other and ensures that the anticipated mask and points are associated with the same person, which is difficult to achieve with different models. Because the BlazePose GHUM model only runs on a person’s ROI crop (rather than the entire image), segmentation mask quality is only affected by the effective resolution within the ROI. It does not change much as you move closer or farther away from the camera.

The MediaPipe runtime enables higher inference speed on desktop, laptop, and Android phones. Whereas on iPhones and iPads, the TensorFlow.js runtime allows for faster inference.

The time it takes to complete the inference through the model and wait for the GPU and CPU to sync is measured in frames per second (FPS). This is done to ensure that the GPU is fully finished for benchmarking purposes, but there is no need to wait for pure-GPU production pipelines.

The two new models can be applied to many innovative applications centered on the human body to power next-generation online apps. The BlazePose GHUM Pose model, for example, might be used to power services such as digitally teleporting your presence anywhere in the world, determining body dimensions for a virtual tailor, and producing special effects for music videos among other things. The Selfie Segmentation model, on the other hand, could provide user-friendly features for web-based video calls, such as the demo above, where you can correctly modify or blur the background.

The team is now working on exciting upgrades for future versions with new features and quality improvements to their technology.

Github: https://github.com/tensorflow/tfjs-models/tree/master/pose-detection/src/blazepose_mediapipe

Demo: https://storage.googleapis.com/tfjs-models/demos/segmentation/index.html?model=blazepose

Reference: https://blog.tensorflow.org/2022/01/body-segmentation.html