Tensorflow Introduces Depth API To Convert Individual Images To 3D Photos

A depth map is an image channel in computer graphics and computer vision that provides information on the distance of the surface of objects as seen from a viewpoint for each pixel in that image. Because of its wide range of applications in augmented reality, portrait mode, and 3D reconstruction, ongoing research in the field of depth-sensing capabilities is being done to pave the way for the future (particularly with the release of the ARCore Depth API). Furthermore, the web community is increasing interest in merging the depth capabilities with JavaScript to enhance the existing web applications by integrating them with real-time AR effects. Despite these recent improvements, the number of photos connected with depth maps continues to be a source of worry.

To drive the next generation of web applications, Tensorflow released its first depth estimation API, called Depth API, and ARPortraitDepth, a model for estimating a depth map for portraiture. They also published 3D photo, a computational photography application that uses the anticipated depth and creates a 3D parallax effect on the given portrait image, further persuading people of the enormous possibilities of depth information. Tensorflow has also launched a live demo for people to try and convert their photographs into 3D versions.

Portrait Depth API is a deep learning model that outputs a depth map from a single color portrait photograph. To improve computing performance, the team used a lightweight U-Net architecture model. The design consists of an encoder that downscales the feature map resolution by a factor of two and a decoder that enhances the feature map resolution to the same level as the input. The encoder’s deep learning features are concatenated to the relevant layers with the exact spatial resolution as the decoders to obtain high-resolution signals for depth estimation. The training procedure entails forcing the decoder to make depth predictions with increasing resolutions at each layer and applying a loss to each one. This allows the decoder to calculate the appropriate depth by gradually increasing complexities.

Feeding an ML model with a high number of varying data points is critical to ensure accuracy and durability. The researchers manually created pairs of color and depth images using a high-quality performance capture system with camera setups to construct a superior training dataset. Actual data were also acquired from mobile phones having a front-facing depth sensor to ensure that the depth quality was not on par with the synthetically generated data, but the color images produced were beneficial in boosting the model’s performance. Before passing the image into the neural network of depth estimation, a segmentation model was run with MediaPipe and TensorFlow.js to improve the model’s robustness against background variation. The researchers believe that this portrait depth model could open the door to a plethora of new creative applications centered on the human body. The detailed instructions to install the portrait depth API, which is one of the variants of the new depth API, can be accessed here

Github: https://github.com/tensorflow/tfjs-models/blob/master/depth-estimation/README.md

Demo: https://storage.googleapis.com/tfjs-models/demos/3dphoto/index.html

Source: https://blog.tensorflow.org/2022/05/portrait-depth-api-turning-single-image.html

[Announcing Gretel Navigator] Create, edit, and augment tabular data with the first compound AI system trusted by EY, Databricks, Google, and Microsoft