The application of self-supervised pretraining to computer vision has been beneficial, especially for object detection. However, previous approaches were not designed with localization in mind – which poses a key obstacle when it comes to tasks like detecting objects.
AI researchers have developed DETReg (DEtection with TRansformers based on Region priors), an innovative unsupervised pretraining approach for object detection. DETReg trains detectors with unlabeled data by providing two key pretraining tasks: The Object Localization Task and the Object Embedding Task. The first task aims to teach a model how to detect objects regardless of their type, while the latter is designed for understanding categories in images. With the simplicity of recent transformers for object detection, this research is chosen to base its approach on the Deformable DETR architecture, which simplifies implementation and training.
The researchers trained DETReg on the two tasks without manually annotated bounding boxes or categories. This approach allows all parameters to be learned, which will produce meaningful detections even with no supervision at all!
The researchers ran their study with DETReg on standard benchmarks, MS COCO and PASCAL VOC. They found that it improved under all of the settings for these challenging baselines across the board. Particularly when there was less than a certain amount of annotated data available, they saw a performance improvement. DETReg outperforms previous supervised and unsupervised baseline approaches on low-data regime when trained with only 1%, 2%, 5%, and 10% of the labeled data on MS COCO.