Latest Computer Vision Research Proposes Lumos for Relighting Portrait Images via a Virtual Light Stage and Synthetic-to-Real Adaptation

If you have ever worked with photo editing, then you probably know how cumbersome it can be to adjust the lighting of a portrait image when you move the person from one environment to another one. You need to track the lightning sources in the new environment and make sure you edit the photo properly, so the portrait looks natural in there. Portrait relighting is a tricky problem.

Portrait relighting seeks to re-illuminate the subject of the image as though they were present in an environment with the target lighting from a portrait photograph of the subject and an environment map of the target lighting. This process can be done manually, as it has been done for years in professional settings. Still, we are more interested in how to automate it to speed up things and enable it to be used casually, like in smartphones.

Understanding the geometry and components of a 3D scene is necessary to mimic complicated lighting interactions, such as global illumination. Consequently, it is quite challenging to relight a picture from a 2D in-the-wild portrait without prior knowledge.

Before the deep learning era, portrait lightning was done using prior information such as 3D face priors, intrinsic pictures, and form from shading or applying style transfer. These models frequently fall short regarding complicated skin surfaces and subsurface reflectivity, an extensive range of clothing options, hairstyles, or intricate reflection patterns between eyewear and other accessories.

As with many applications, deep learning methods have also tackled portrait relighting in recent years. However, as with all deep learning solutions, the requirement for a large dataset is the first thing to be solved. For portrait relighting, the dataset construction can be a bit tricky due to the under-constrained nature of the problem, meaning that there can be thousands of different lighting conditions for the very same picture.

Several studies use a physical light stage to construct the dataset to train the deep learning method for portrait relighting. This physical stage is a spherical lighting setup uniformly dispersed over the sphere and has hundreds or thousands of programmable inward-pointing light sources. Each light source can provide a unique lighting direction. 

These light sources are activated one light at a time (OLAT), and different portrait images are captured in various lighting conditions. Pairs of relit portrait photos and their accompanying target environment maps make up each sample in the dataset. These pairs are then used to train the deep learning model, which tries to predict the relit image given an input portrait and a target environment map.

Building a light stage is costly, even if it is helpful for ground truth computing. Additionally, preparing a light stage is just the beginning. For data collection, it is still necessary to find several individuals who will remain seated within the light stage. The subsequent OLAT scans likewise demand time-consuming and labor-intensive operations to capture and analyze.

LUMOS tackles this dataset preparation problem by moving things into the virtual space. 

The virtual dataset preparation is a challenging process as it should be realistic enough to train the network. Therefore, LUMOS relies on the major observation that the relighting problem can be decoupled into two parts. The first is learning to imitate physically-based lighting characteristics using synthetic data generated by a path tracer, and the second is learning to synthesize photorealistic portrait photos.

A physically-based renderer is used to build a virtual light stage to re-illuminate pre-captured photogrammetry face scans to produce the training dataset. Photogrammetry-based face scans are much easier to collect than OLAT scans, so the dataset can have way more variation. 

OLAT vs. virtual light stage used in Lumos. Source: https://arxiv.org/abs/2209.10510

Generating datasets in a virtual environment removes the hassle of building a light stage and collecting data using real people sitting there for minutes. However, if the dataset only consists of synthetic portrait images, the deep learning model can decouple from reality. LUMOS uses real-world portrait images to adapt intermediate representations learned in the portrait relighting network to overcome this issue. A novel self-supervised lighting consistency loss is used to force the network to maintain consistent relighting during domain adaptation.

Finally, the authors asked themselves, “what about the video relighting?” and extended LUMOS for the video use case. Since doing portrait relighting frame-by-frame would have resulted in flickering artifacts, extension from picture to video is modeled as a domain adaptation issue, and the same approach followed during constructing the synthetic-to-real dataset domain adaptation is applied. 

In the end, LUMOS is obtained. It proposed a synthetic dataset for single-image relighting with extremely diverse images to show the relighting problem can be tackled without the physical light stage. Self-supervised losses are proposed to ensure lighting consistency during the adaptation learning. Experiments showed that LUMOS could achieve an impressive result in both picture and video portrait relighting.   

Sample results obtained with LUMOS. Source: https://arxiv.org/abs/2209.10510
This Article is written as a research summary article by Marktechpost Staff based on the research paper 'Learning to Relight Portrait Images via a Virtual Light Stage and Synthetic-to-Real Adaptation'. All Credit For This Research Goes To Researchers on This Project. Check out the paper, demo and project.

Please Don't Forget To Join Our ML Subreddit

Ekrem Çetinkaya received his B.Sc. in 2018, and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. He wrote his M.Sc. thesis about image denoising using deep convolutional networks. He received his Ph.D. degree in 2023 from the University of Klagenfurt, Austria, with his dissertation titled "Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning." His research interests include deep learning, computer vision, video encoding, and multimedia networking.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...