Researchers From China Propose ‘NeuralMarker’: A Framework for Learning General Marker Correspondence

Imagine you draw a line in a particular video frame, and you want to preserve that line in the upcoming frames. That means you need to determine the corresponding locations in each frame with respect to the reference frame and precisely determine the pixel movement under various conditions such as viewpoint change, deformation, and lightning change. It is extremely challenging to estimate all these changes. 

Finding matching positions at a reference picture for each pixel in the query marker is the goal of general marker correspondence estimation. A general marker is an arbitrary marker provided by the user. General marker correspondence is a fundamental component of several downstream applications, including marker-based Augmented Reality (AR) and video editing. It can be used to embed advertisements into the video, fast video editing, add objects in augmented reality scenes, lightning preserving image editing, etc. 

Example use cases of NeuralMaker. Source:

Traditional methods estimate the marker correspondence by fitting a homography with sparse features. These models can only handle SE(3) translation of a plane (i.e., rotation, translation, and reflection). Therefore, they are not capable of handling distorted markers and fail to provide good results if the translation is too much. 

Deep learning-based methods provided a significant performance boost in many domains, and the same applies to correspondence estimation. Although it achieves impressive performance, deep learning-based methods are data-hungry, and annotating pixel-wise dense correspondence to train the marker correspondence method is time-consuming and expensive. Therefore, NeuralMaker is trained on a synthetic dataset.

Since the main challenges in correspondence estimation come from two aspects which are geometry estimation and appearance estimation, NeuralMaker is proposed to tackle both aspects. 

First, the FlyingMarkers synthetic dataset is generated, which consists of marker-image pairings with dense ground-truth correspondences. FlyingMarkers generates a synthetic reference picture by warping a marker following a randomly generated geometric transformation and blending it with the reference background image. The neural network is encouraged to learn different marker movements by training with FlyingMarkers.

Second, a new loss function Symmetric Epipolar Distance (SED), is proposed, which enables learning dense correspondence from posed images. The reference image in real-world situations may contain large appearance fluctuations that are challenging to synthesize. In addition, if the image feature encoder is only trained on synthesis images, it will be biased by those photos. Thus, NeuralMaker is also trained with real photographs to account for different real appearance situations. The projected correspondences based on camera postures are constrained by the Symmetric Epipolar Distance (SED) loss. NeuralMarker is exceptionally resilient in adverse lighting situations and avoids synthetic image bias thanks to its learning with the SED loss.

Without using a homography model, NeuralMarker directly calculates dense correspondences at the pixel level from the complete marker and picture, making full use of appearance information and eliminating plane limitations. It significantly outperforms existing methods and enables new interesting applications such as video editing and augmented reality in challenging lighting conditions.

This was a brief summary of the NeuralMaker paper. They have an excellent demo website with examples and the code if you want to learn more about it.

This Article is written as a research summary article by Marktechpost Staff based on the research paper 'NeuralMarker: A Framework for Learning General Marker Correspondence'. All Credit For This Research Goes To Researchers on This Project. Check out the paper.

Please Don't Forget To Join Our ML Subreddit

Ekrem Çetinkaya received his B.Sc. in 2018, and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. He wrote his M.Sc. thesis about image denoising using deep convolutional networks. He received his Ph.D. degree in 2023 from the University of Klagenfurt, Austria, with his dissertation titled "Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning." His research interests include deep learning, computer vision, video encoding, and multimedia networking.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...