Augmented reality (AR) is on its way to becoming a part of our daily lives. We can define it as placing a virtual object in the real world and making sure it preserves its location and shape over time until it is removed from the scene. These scenarios necessitate AR devices properly determining their 6-DoF position at any moment in time in order to consistently overlay virtual material in the real environment with pixel-level precision.
Visual localization and mapping have been studied intensively in the computer vision domain. However, their application to AR can be tricky and presents unique challenges.
One of these challenges is about the devices we use to view AR content. We mostly use mobile phones or AR-specific gadgets like HoloLens from Microsoft. These devices are equipped with multiple cameras and extra sensors, making it tricky to map and localize AR content using methods proposed for single-camera setups.
Moreover, we follow unique hand or head movement patterns when we use our devices to view AR content. The on-device real-time tracking systems provide spatially-posed sensor streams, meaning they are related to each other in depth, width, and height dimensions. However, in many AR scenarios, objects can change over time, and it might be needed to track the object beyond local tracking. So, the AR tracking system should be robust against temporal changes in appearance and structures.
The other challenge is about the temporal sensor data. There are often many different pieces of data coming in from sensors, and the device we use needs to be able to make sense of all of that data quickly. This is crucial because if the device can’t keep up with the data, it won’t be a good experience for the person using it.
Finally, as more people adopt AR, more opportunities will be there to prepare crowd-sourced large-scale maps using data from various devices. However, this will not be straightforward as certain challenges must be addressed, such as ensuring robust algorithms and preserving privacy.
Despite all these challenges in the AR domain, current academic research is mainly driven by benchmarks that fail to address any of the aforementioned challenges. This is where LaMAR comes into play. LaMAR is here to provide a robust and realistic benchmark for AR studies focusing on localization and mapping. LaMAR has three main contributions.
The first contribution is to introduce a large-scale dataset captured using AR devices in various contexts, including a historical building, a multi-story office building, and a city center section. The dataset contains both indoor and outdoor scenes with illumination and semantic changes, as well as dynamic objects. Data is captured using both hand-held devices such as iPad and head-mounted devices like HoloLens over a span of a year.
The second contribution is providing a pipeline to produce automatic and accurate ground truth AR trajectories against large-scale 3D laser scans. This pipeline can handle crowd-sourced data from heterogeneous devices, making it possible to extend the dataset with further data and different device types.
Finally, a detailed evaluation of localization and mapping techniques in the AR domain is presented. Novel insights for future research directions are given during these evaluations.
This was a brief summary of LaMAR, the new benchmarking for AR localization and mapping. You can find more information in the links below if you are interested in learning more about it.
This Article is written as a research summary article by Marktechpost Staff based on the research paper 'LaMAR: Benchmarking Localization and Mapping for Augmented Reality'. All Credit For This Research Goes To Researchers on This Project. Check out the paper, code and project.
Please Don't Forget To Join Our ML Subreddit
Ekrem Çetinkaya received his B.Sc. in 2018, and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. He wrote his M.Sc. thesis about image denoising using deep convolutional networks. He received his Ph.D. degree in 2023 from the University of Klagenfurt, Austria, with his dissertation titled "Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning." His research interests include deep learning, computer vision, video encoding, and multimedia networking.