This AI Research Unveils Photo-SLAM: Elevating Real-Time Photorealistic Mapping on Portable Devices

In computer vision and robotics, simultaneous localization and mapping (SLAM) with cameras is a key topic that aims to allow autonomous systems to navigate and understand their environment. Geometric mapping is the main emphasis of traditional SLAM systems, which produce precise but aesthetically basic representations of the surroundings. Nonetheless, recent advances in neural rendering have shown that it is possible to incorporate photorealistic image reconstruction into the SLAM process, which might improve robotic systems’ perception abilities. 

Existing approaches significantly rely on implicit representations, making them computationally demanding and unsuitable for deployment on resource-constrained devices, even though the merging of neural rendering with SLAM has produced promising results. For example, ESLAM uses multi-scale compact tensor components, whereas Nice-SLAM uses a hierarchical grid to hold learnable features that reflect the environment. Subsequently, they collaborate to estimate camera positions and maximize features by reducing the reconstruction loss of many ray samples. The process of optimization takes a lot of time. Therefore, to guarantee effective convergence, they must integrate relevant depth information from several sources, such as RGB-D cameras, dense optical flow estimators, or monocular depth estimators. Furthermore, because the multi-layer perceptrons (MLP) decode the implicit features, it is usually required to specify a boundary region precisely to normalize ray sampling for best results. It restricts the system’s potential to scale. These restrictions suggest that one of the primary goals of SLAM real-time exploration and mapping capabilities in an unfamiliar area utilizing portable platforms cannot be achieved. 

In this publication, the research team from The Hong Kong University of Science and Technology and Sun Yat-sen University present Photo-SLAM. This novel framework performs online photorealistic mapping and exact localization while addressing current approaches’ scalability and computing resource limitations. The research team keep track of a hyper primitives map of point clouds that hold rotation, scaling, density, spherical harmonic (SH) coefficients, and ORB characteristics. By backpropagating the loss between the original and rendered pictures, the hyper primitive’s map enables the system to learn the corresponding mapping and optimize tracking using a factor graph solver. Rather than using ray sampling, 3D Gaussian splatting is used to produce the pictures. While introducing a 3D Gaussian splatting renderer can lower the cost of view reconstruction, it cannot produce high-fidelity rendering for online incremental mapping, especially when the situation is monocular. In addition, the study team suggests a geometry-based densification technique and a Gaussian Pyramid-based (GP) learning method to accomplish high-quality mapping without depending on dense depth information. 

Figure 1: Photo-SLAM is a revolutionary real-time framework that supports RGB-D, stereo, and monocular cameras for simultaneous localization and photorealistic mapping. With a render speed of up to 1000 frames per second, it can rebuild high-fidelity scene views.

Crucially, GP learning makes it easier for multi-level features to be acquired gradually, significantly improving the system’s mapping performance. The study team used a variety of datasets taken by RGB-D, stereo, and monocular cameras in their lengthy trials to assess the effectiveness of their suggested method. The findings of this experiment clearly show that PhotoSLAM achieves state-of-the-art performance in terms of rendering speed, photorealistic mapping quality, and localization efficiency. Moreover, the Photo-SLAM system’s real-time operation on embedded devices demonstrates its potential for useful robotics applications. Figs. 1 and 2 show the schematic overview of Photo-SLAM in action. 

Figure 2 shows the four key components of Photo-SLAM, which maintains a map with hyperprimitive elements and consists of localization, explicit geometry mapping, implicit photorealistic mapping, and loop closure components.

This work’s primary achievements are the following: 

• The research team created the first photorealistic mapping system based on hyper primitives map and simultaneous localization. The new framework works with indoor and outdoor monocular, stereo, and RGB-D cameras. 

• The research team suggested using Gaussian Pyramid learning, which enables the model to learn multi-level features effectively and rapidly, resulting in high-fidelity mapping. The system can operate at real-time speed even on embedded systems, achieving state-of-the-art performance thanks to its complete C++ and CUDA implementation. There will be public access to the code.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...