Do you remember those advanced computers in sci-fi movies where everything is in 3D, you can move what you see around with your fingers, see all different angles, walk around the room, etc? Have you ever wanted to experience that? If you did, your best bet at generating a realistic 3D model of an object or scene is using NeRF, neural radiance fields, nowadays.
NeRF is a cutting-edge technique that uses deep learning to generate high-quality 3D models from 2D images. NeRF learns a continuous representation of a scene’s radiance field, which describes how light behaves a is travels through the scene. Previously, if you wanted to learn how light behaves in the scene, you should’ve calculated the ray traces in every single angle, which is hugely time-consuming and computationally complex. NeRF uses a neural network to learn this representation and can accurately capture the lighting and shading of the real world.
To generate a NeRF model, the neural network is trained to predict the color and opacity of a point in 3D space given its coordinates. You can then use this volumetric representation to generate novel views of the scene from any viewpoint you want and interact with it in various ways.
Although NeRF is the best bet we have to generate realistic 3D scenes from images we captured in the real world, it is not perfect in any way. The biggest challenge is the complexity. All these predictions and rendering do not come cheap. If you want the NeRF model of your favorite Funko Pop! you can probably do it if you have a powerful GPU. However, when it comes to generating the NeRF model of your garden, things become really tricky as you would need extremely powerful computers to do so, as rendering the model would be a real burden on your GPU memory.
What if we don’t have an extremely powerful computer with GBs of memory? Do we have to stick with NeRF modeling of small objects? Well, no, you can actually use NeRF even if you do not have a space-grade computer. Time to meet with MERF.
MERF, Memory-Efficient Radiance Field, is designed to achieve real-time rendering of large-scale scenes using a fraction of the memory compared to the existing NeRF models. MERF smoothly moves around the trade-offs which need to be considered when it comes to volumetric modeling methods.
First is the trade-off between the volume and the surface. Purely volumetric rendering models are easier to optimize using gradient-based methods and produce high-quality view synthesis results. On the other hand, surface-like representations that are more sparse and compact are cheaper to render but may result in lower image quality.
Second is the trade-off between memory bound and compute-bound. The more compact representations require many operations to query, and the fastest representations consume large amounts of memory. One approach to tackle this is to use a slower but more compact volumetric model for optimization and then “bake” it into a larger and faster representation for rendering. However, baking can lead to a significant drop in image quality, and fine-tuning the baked representation may not scale well to larger scenes, as it requires more memory for computing gradients than rendering.
MERF targets to find an optimization that sits on the sweet spot for both these trade-offs. MERF consists of a combination of voxel grid and triplane data structure which makes it memory efficient. To further optimize MERF, the NGP hash grid structure is used for compressing the parameterization. Doing so enables differentiable sparsification and helps with convergence. Once the optimization step is done, the NGP is converted into a binary occupancy grid which is far more efficient for rendering. Finally, both the NGP-parameterized and baked MERF represent the same radiance field function to ensure the high-quality results achieved during optimization are carried over to the real world.
MERF is a compressed volume representation for radiant fields. It can achieve real-time rendering on a web browser using consumer-grade hardware. You can find an interactive demo on their website if you want to try it yourself.
Check out the Paper and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 14k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Ekrem Çetinkaya received his B.Sc. in 2018, and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. He wrote his M.Sc. thesis about image denoising using deep convolutional networks. He received his Ph.D. degree in 2023 from the University of Klagenfurt, Austria, with his dissertation titled "Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning." His research interests include deep learning, computer vision, video encoding, and multimedia networking.