A New Artificial Intelligence (AI) Study From CMU and Meta Proposes a Framework for Efficient Neural Relighting of Articulated Hand Models

Neural rendering is a cutting-edge technology that uses artificial intelligence and deep learning to create photorealistic images and animations. Unlike traditional rendering techniques that rely on mathematical models, neural rendering algorithms learn to replicate the complex interactions between light and materials in the real world. This allows for creating images with outstanding detail, texture, and realism.

The importance of neural rendering lies in its ability to enhance the quality and efficiency of computer graphics. By eliminating the need for labor-intensive manual processes and simplifying the rendering pipeline, neural rendering can significantly reduce the time and cost involved in creating high-quality images and animations. This makes it an invaluable tool for professionals in industries such as film, video game development, and virtual and augmented reality.

In addition, neural rendering can also be used for a variety of creative applications, such as generating new perspectives and viewpoints of existing scenes, enhancing low-resolution images, and enabling interactive exploration of digital environments. 

Among the state-of-the-art models employed in neural rendering, many rely on the use of simplified geometric and appearance models (such as linear blend skinning and reduced material models). This allows faster computation but comes with a noticeable degradation in rendering fidelity.

🔥 Best Image Annotation Tools in 2023

So far, photorealistic rendering of animatable hands with global illumination effects in real-time remains an open challenge.

To address this problem, an AI framework has been developed to enable the photorealistic rendering of a personalized hand model that can be animated with novel poses in novel lighting environments and supports rendering two-hand interactions. The idea is to construct a relightable hand model to reproduce light-stage captures of dynamic hand motions. For this purpose, the authors capture spatiotemporal-multiplexed illumination patterns, where fully-on illumination is interleaved to enable tracking of the current state of hand geometry and poses.

This neural relighting framework relies on a two-stage teacher-student interaction for real-time rendering. 

An overview of the teacher model is depicted below.

The teacher model is trained to infer a radiance value given a point-light position, a viewing direction, and light visibility.

Learning the mapping between an input light position and output radiance guarantees that the network accurately models complex reflectance and scattering on the hand without the need for path tracing. 

Natural illuminations are modeled as a combination of distant point-light sources to render hands in arbitrary illuminations. 

The teacher model’s renderings are then used as pseudo ground truth to train an efficient student model conditioned on the target environment maps, as illustrated in the picture below.

Based on recent neural portrait relighting studies, lighting information is computed using physics-inspired illumination characteristics such as visibility, diffuse shading, and specular reflections. Because these characteristics are based on geometry and represent the first bounce of light transmission, they strongly correlate with the lighting information and can be easily exploited to deduce the correct radiance under natural lighting conditions. Visibility, in particular, is essential in disentangling lights and postures, decreasing the learning of spurious correlations that can exist in restricted training data. However, calculating visibility precisely for every light is prohibitively computationally costly for real-time visualization. 

To overcome this limitation, a coarse proxy mesh is used for computing the lighting features. This mesh shares the same UV (bi-dimensional) parameterization as the hand model. 

The fully convolutional architecture learns to compensate for the approximate nature of the input features and infers both local and global light transport effects. This way, according to the authors, the framework achieves a high frame and can render appearance under natural illumination in real-time. 

The figure below represents some results achieved by the proposed approach.

This was the summary of a novel AI framework for real-time, efficient neural relighting of articulated hand models.

If you are interested or want to learn more about this framework, you can find a link to the paper and the project page.

Check out the Paper and Project Page. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 14k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Daniele Lorenzi received his M.Sc. in ICT for Internet and Multimedia Engineering in 2021 from the University of Padua, Italy. He is a Ph.D. candidate at the Institute of Information Technology (ITEC) at the Alpen-Adria-Universität (AAU) Klagenfurt. He is currently working in the Christian Doppler Laboratory ATHENA and his research interests include adaptive video streaming, immersive media, machine learning, and QoS/QoE evaluation.