ImageNet is one of the most popular image classification benchmarks. It contains more than 14 million labeled images and has improved many image recognition models’ performances. However, it also includes a significant label noise level.
Studies show that many ImageNet samples contain multiple object classes, despite being assumed to be a single-label benchmark. The de-facto standard for the image classifiers also includes a striking level of label noise. Thus the aim is to provide ImageNet training images with multi-label classes and localized labels that indicate where each object is located.
Researchers from NAVER AI Lab in South Korea have found a computationally efficient re-labeling strategy that fixes this vital flaw in ImageNet. The team proposes “ReLabel,” a novel re-labeling method developed to turn ImageNet evaluation into a multi-label task, with exhaustive multi-label annotations per image.
Instead of expanding the ImageNet validation set labels into multi-labels like earlier work, the researchers have tried to develop a method for the ImageNet training labels. This approach transforms the single-class labels of 1.28 million training images on ImageNet into multi-class labels assigned to image regions.
The researchers have pre-trained a machine annotator on a super-ImageNet scale (the JFT-300M dataset with 300 million images and InstagramNet-1B with about 1 billion Instagram images). They then fine-tuned the machine annotator on ImageNet to predict the database’s classes. This generates new multi-class ground truth labels without incurring the overwhelming costs associated with human annotators.
The ReLabel approach allows machine annotators to generate location-wise multi-labels. The new LabelPooling framework trains the image classifier using these localized multi-labels. It passes predictions through the final pooling layer stating additional location-specific supervision signals. ReLabel results in a steady gain across ImageNet benchmarks, transfer-learning tasks, and multi-label classification tasks.
The team pre-trained state-of-the-art classifier EfficientNet-L2 trained with JFT-300M , and ResNeXT-101 32x with InstagramNet-1B. They selected EfficientNet-L2 as their machine annotator because it boosted the top-1 classification accuracy of ResNet-50. On ImageNet classification, training ResNet-50 with ImageNet ReLabel achieved a top-1 accuracy of 78.9%, for a +1.4 percent accuracy gain over the baseline model trained with the original labels.
Even though the machine annotator was pre-trained, when the proposed ReLabel method is combined with the LabelPooling framework, it drastically decreased ResNet-50 training time from 328 GPU hours to only 10 GPU hours.
Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Bhubaneswar. She is a Data Science enthusiast and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring the new advancements in technologies and their real-life application.