Latest Computer Vision Research Propose An Optimization Framework Called ‘SAMURAI,’ For Joint Camera, Shape, BRDF, And Illumination Estimation

Immersive applications such as augmented reality (AR) and virtual reality (VR) are getting more attention thanks to the rapid advancement in the field. A mobile game with added AR elements or a movie played in VR glasses is an enhanced user experience.

Preparing 3D content for immersive media experiences is a tricky and challenging task. One can use multi-view camera setups to capture objects in 3D that can provide high-quality assets, but it would not be possible to do this for every single object in the world. Therefore, more practical approaches are required to extract 3D information using off-the-shelf cameras.

Nowadays, there are tens or hundreds of images available on the Internet for almost every single thing you can think of. Multiple ways exist to extract a set of images for a given object, from image collections offered by image search results to product review photographs. What if we could use this enormous data to construct 3D objects in AR and/or VR applications? 

This is the question that SAMURAI tries to answer. Inverse depiction of an object under completely unknown capture conditions is a key problem, and they have an answer for this.

However, it is not an easy task to render 3D objects from unknown images. Since the images are so unrestricted and have different backgrounds, illuminations, and inherent camera characteristics, estimating 3D forms and materials from Internet images presents several difficulties. However, it is not an easy task to render 3D objects from unknown images. Therefore, joint optimization over shape, illumination, and pose is required.

Usually, in 3D rendering, the goal is to estimate the 3D form and Bidirectional Reflectance Distribution Function (BRDF) properties of the object. However, when it is done via images where the capture information is unknown, per image lighting, camera positions, and intrinsics also need to be estimated.

Several recent studies on form and material estimation make the following assumptions: unchanging camera intrinsics, almost flawless segmentation masks, and nearly accurate camera pose. However, this is not practical in real-world scenarios.

SAMURAI workflow and sample applications. Source: https://arxiv.org/pdf/2205.15768.pdf

SAMURAI is proposed to jointly estimate the form, BRDF, and per-image camera posture and illumination. This is a very under-constrained and complicated optimization issue when just image collections and imprecise camera position quadrants are provided as input. SAMURAI tackles this tough challenge with well-thought-out camera parameterization and optimization approaches.

SAMURAI can achieve flexible camera parametrization for varying distances by learning clipping planes for each picture and describing the neural volume in global coordinates. 

Moreover, since optimizing a single camera per image can get stuck in local minima, SAMURAI uses a multiplex camera setup where several camera poses are optimized per image.

Not all input images are equally useful for optimization as different input images have different noises. Thus, SAMURAI uses posterior scaling of the input photos, which evaluates the impact of various images on optimization.

SAMURAI overview. Source: https://arxiv.org/pdf/2205.15768.pdf

SAMURAI produces 3D models for various visual applications, including AR/VR, gaming, material editing, etc. SAMURAI can extract explicit meshes with BRDF texture, making the resultant 3D models easily useable in current graphics engines. When applied to existing datasets, SAMURAI showed superior view synthesis and relighting results. 

This was a brief summary of SAMURAI. You can find useful links below if you are interested in learning more about it.

This Article is written as a research summary article by Marktechpost Staff based on the research paper 'SAMURAI: Shape And Material from Unconstrained Real-world Arbitrary Image collections'. All Credit For This Research Goes To Researchers on This Project. Check out the paper and code.

Please Don't Forget To Join Our ML Subreddit

Ekrem Çetinkaya received his B.Sc. in 2018 and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. He wrote his M.Sc. thesis about image denoising using deep convolutional networks. He is currently pursuing a Ph.D. degree at the University of Klagenfurt, Austria, and working as a researcher on the ATHENA project. His research interests include deep learning, computer vision, and multimedia networking.