This Article Is Based On The Research Paper 'NeurMiPs: Neural Mixture of Planar Experts for View Synthesis'. All Credit For This Research Goes To The Researchers Of This Paper 👏👏👏 Please Don't Forget To Join Our ML Subreddit
Technology is booming so that in the future, there will be a scenario where people will be able to explore the world being in their room itself. As people move forward, details will pop up, and as they move sideways, occluded regions will reappear. Although this scenario looks tempting, making this scenario a reality needs innovations in several domains. One such domain is Novel View Synthesis (NVS), which is high quality, can work in real-time, and is also memory efficient. In the NVS system, the challenge is to re-render the scene from different viewpoints photo-realistically. Also, the system needs to be lightweight and fast to be utilized everywhere. Experts have proposed several methods through which the visual world can be reproduced to accomplish this challenge. One approach is to perform Image-Based Rendering (IBR) by modeling the scene’s geometry and point clouds. This approach adapts visual features from other existing views, and it can render high-quality images. However, this approach consumes high memory and requires proxy geometry.
In contrast to this approach, neural radiance fields synthesize high realistic images and consume less memory. It can handle complex geometry and scene effects that are challenging for other traditional methods. However, it has one challenge of surface modeling. If surface modeling is not proper, scene geometry cannot be captured precisely, leading to the generation of artifacts.
This paper intends to discover other 3D scene representation that is expressive, effective, compact, and generalizable. This work models surface using planar geometry. It models real-world surfaces through piece-wise local planar structures. In contrast to the multi-planar imagery, the approach allows each plane to have an arbitrary direction, position, and size. This achieves fast rendering, and computation in empty spaces is eliminated.
This work proposed a novel neural representation termed a mixture of planar experts. It also presents a design of a neural rendering method using NeurMiPs. In this work, the scene is initially represented as a mixture of local planar surfaces, an oriented 2D rectangle. A neural radiance field function is utilized for each plane to encode its view-dependent appearance and transparency. Also, the input images are used for learning the geometry and the radiance fields. A ray rectangle intersection is verified during the rendering time. Intersecting point’s coordinate is used to evaluate the color and transparency. Lastly, the ray color is computed by combining the colors of all intersecting topics using alpha blending. The proposed 3D architecture and other neural rendering approaches are represented in Figure 1.
For each planar expert model, this work utilizes Multi-Layer Perceptron (MLP) with three fully connected hidden layers, ReLU activation for each hidden layer, and sigmoid activation for final output. It uses a mixture of planar experts to fit the surface geometry. While testing, a pixel is rendered by shooting a ray from the eye, and the radiance is evaluated along the ray. An intersection is done between the given ray and an infinite size plane to decide whether a local plane intersects the ray. Only the intersected rectangles are reserved on which ray’s color and transparency are evaluated. The alpha composition is conducted to get the ultimate approximation of the ray’s color.
NeurMiPs require both optimizing the plane geometry and the radiance for training. The plane geometric parameters are projected by structure-from-motion using the coarse 3D point cloud. The radiance and geometry are jointly optimized. A large capacity NeRF model is trained as the teacher model to distill knowledge. After over-fitting to the teacher network, plane parameters are fixed, and student radiance field models are fine-tuned to enhance rendering quality.
This work proposes to pre-render alpha values for each rectangular plane bake, the pre-rendered alpha values. Early ray termination is avoided to prevent further network evaluation. A custom CUDA kernel is designed to intersect the ray plane, alpha composition, and model inference.
The proposed approach is evaluated on two different datasets, Replica and Tanks & Temples. Moreover, the replica is a simulated dataset with various indoor scenarios. This research selects seven random scenes, and 50 training images and 100 test images are rendered for each scene. Each scene has high-resolution geometry and photorealistic textures. Blender Proc is adapted as the physical-based rendering engine, and the image resolution is 512 * 512 pixels. Tanks & Temples covers five real-world high-resolution scenarios taken from the surrounding 360o view.
The work is evaluated quantitatively by using the approach Peak Signal-to-Noise Ratio (PSNR), perceptual metric (LPIPS), and structural similarity index (SSIM). The algorithm is compared with the most promising techniques, neural radiance field (NeRF), MPI-based method (NeX), and hybrid real-time methods such as NVSF, KiloNeRF, and PlenOctrees.
This paper proposes NeurMiPs, a novel 3D illustration for novel view synthesis. In contrast to the neural surface rendering as it consumes less memory and time. Also, the approach is considerably more sample-efficient with better extrapolation than volume rendering. The proposed method effectively reflects the geometry of multi-plane images. Also, this approach achieves superior performance in contrast to the state-of-the-art techniques on a new challenging benchmark for view extrapolation.