Meet STEPS: A New Computer Vision Method That Jointly Learns A Nighttime Image Enhancer And A Depth Estimator Without Using Ground Truth

In recent times, researchers have gained considerable interest in self-supervised depth estimation techniques because of their low hardware cost and ability to promote the 3D sensing capabilities of self-driving vehicles. By employing the underlying geometry in image sequences as supervision, self-supervised learning for depth estimation produces encouraging results. Their performance on several datasets, including KITTI, Cityscapes, etc., is equivalent to that of other supervised learning approaches, which supports their outstanding performance. 

However, existing research on image-based depth estimation mostly focuses on daylight image sequences where the photometric consistency assumption typically holds, and the inputs are well-lit. But at night, things are different. Researchers frequently use a variety of nighttime picture enhancement techniques to address this issue of photometric consistency and boost the quality of input photographs. However, because existing paired day/night datasets concentrate on indoor settings, supervised nighttime image enhancers are frequently constrained by dataset bias. On the other hand, creating these kinds of paired datasets for dynamic road scenarios becomes quite difficult when it comes to the use case of self-driving vehicles.

To address this issue, researchers from Tsinghua University’s Institute for AI Industry Research (AIR) and the Chinese Academy of Sciences introduced STEPS (Joint Self-supervised Nighttime Image Enhancement and Depth Estimation). This first-of-its-kind framework jointly learns a nighttime image enhancer and a single-view depth estimator without relying on ground truth for either task. This technique uses a recently developed pixel masking scheme to tightly entangle two self-supervised tasks. On public benchmarks, the strategy greatly outperforms current state-of-the-art methods.

The pixel masking strategy developed by the team, which is the primary foundation behind their framework, is based on the team’s discovery while developing the framework that nighttime images suffer not only from underexposed regions but also from overexposed regions (also referred to as unexpected regions). Underexposed and overexposed areas result in the loss of fine-grained information and hinder the model’s ability to calculate precise depth using local contextual cues. In order to identify such unexpected regions, the researchers used the illumination component in the self-supervised nighttime image enhancer. They also proposed a bridge-shape model for soft auto-masking wherein both regions are suppressed naturally by fitting a bridge-shaped curve to the illumination map distribution.

In order to address the issue of the sparse ground truth of existing datasets, the researchers first turned to CARLA (the simulator for autonomous driving research), intending to translate the knowledge learned in the simulation environment to the actual world. However, it is difficult to use the simulated data directly due to the significant domain gap between the simulated and real-world images. As a result, the researchers suggested CARLA-EPE, a new photo-realistically improved nighttime dataset based on CARLA. According to many experimental evaluations, the tasks in this newly created synthetic dataset are more difficult than others, which poses significant new challenges to the area. 

The researchers evaluated their method on two established datasets, namely nuScenes and RobotCar. RobotCar is an autonomous driving dataset that includes videos taken along a consistent route in a variety of weather, traffic, and time of day and night conditions. In contrast, nuScenes is a large autonomous driving dataset of over 1000 video clips collected in various road scenes and weather conditions. On both benchmarks, the approach shows state-of-the-art performance. The researchers successfully developed a self-supervised system that can simultaneously learn image enhancement and depth estimation. Additionally, this research resulted in the development of a brand-new photo-realistically enhanced nighttime dataset with substantial depth ground truth. The team has also publicly released all their code which can be accessed here.


Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 13k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Khushboo Gupta is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Goa. She is passionate about the fields of Machine Learning, Natural Language Processing and Web Development. She enjoys learning more about the technical field by participating in several challenges.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...