NVIDIA AI Proposes A Novel AI Framework For Mixed Reality Tasks, Such As Photorealistic Virtual Object Insertion

1728
Source: https://arxiv.org/pdf/2109.06061.pdf

It is often challenging to estimate albedo, normals, depth, and 3D spatially-varying lighting from a single image all at the same time. The problem with existing methods is that they are formulated as image-to-image translation, ignoring the 3D properties of a scene. It’s no surprise that a 2D representation of an indoor scene is insufficient to capture how light moves around in 3D space.

Researchers from NVIDIA, the University of Toronto, and the Vector Institute propose a novel approach to estimating reflectance, shape, and 3D spatially varying lighting by formulating the complete rendering process in an end-to-end trainable way with a 3D lighting representation. They propose a novel Volumetric Spherical Gaussian representation for lighting, a voxel unit representation for the scene surface.

Each voxel is assigned a set of spherical Gaussian parameters that control its light source’s emission direction and sharpness. This way, strong directional lighting can be handled without any issues. With the lack of ground truth for HDR lighting, the research team has designed their raytracing-based differentiable renderer that leverages their lighting representation. It can be used to formulate an energy-preserving image formation process. The renderer is used to ensure that predictions are physically correct without sacrificing accuracy.

https://nv-tlabs.github.io/inverse-rendering-3d-lighting/

According to the research paper, the proposed approach by the research team outperforms existing state-of-the-art inverse rendering and lighting estimation methods. The researchers in this study introduced a holistic monocular inverse rendering framework that jointly estimates albedo, normals, depth, and HDR light field. The proposed Volumetric Spherical Gaussian representation is excellent for handling high-frequency details spatially and angularly. The model is capable of correctly predicting HDR lighting, despite only having been trained using LDR images. These results demonstrate the great potential of this model for augmented reality (AR) applications such as virtual object insertion.

Paper: https://arxiv.org/pdf/2109.06061.pdf

Project: https://nv-tlabs.github.io/inverse-rendering-3d-lighting/