Researchers At NAVER AI Lab Introduces ReLabel: A Novel Framework To Turn ImageNet Evaluation Into A Multi-Label Task

ImageNet is one of the most popular image classification benchmarks. It contains more than 14 million labeled images and has improved many image recognition models’ performances. However, it also includes a significant label noise level.

Studies show that many ImageNet samples contain multiple object classes, despite being assumed to be a single-label benchmark. The de-facto standard for the image classifiers also includes a striking level of label noise. Thus the aim is to provide ImageNet training images with multi-label classes and localized labels that indicate where each object is located. 

Researchers from NAVER AI Lab in South Korea have found a computationally efficient re-labeling strategy that fixes this vital flaw in ImageNet. The team proposes “ReLabel,” a novel re-labeling method developed to turn ImageNet evaluation into a multi-label task, with exhaustive multi-label annotations per image. 

Instead of expanding the ImageNet validation set labels into multi-labels like earlier work, the researchers have tried to develop a method for the ImageNet training labels. This approach transforms the single-class labels of 1.28 million training images on ImageNet into multi-class labels assigned to image regions.

The researchers have pre-trained a machine annotator on a super-ImageNet scale (the JFT-300M dataset with 300 million images and InstagramNet-1B with about 1 billion Instagram images). They then fine-tuned the machine annotator on ImageNet to predict the database’s classes. This generates new multi-class ground truth labels without incurring the overwhelming costs associated with human annotators.

The ReLabel approach allows machine annotators to generate location-wise multi-labels. The new LabelPooling framework trains the image classifier using these localized multi-labels. It passes predictions through the final pooling layer stating additional location-specific supervision signals. ReLabel results in a steady gain across ImageNet benchmarks, transfer-learning tasks, and multi-label classification tasks.

The team pre-trained state-of-the-art classifier EfficientNet-L2 trained with JFT-300M , and ResNeXT-101 32x with InstagramNet-1B. They selected EfficientNet-L2 as their machine annotator because it boosted the top-1 classification accuracy of ResNet-50. On ImageNet classification, training ResNet-50 with ImageNet ReLabel achieved a top-1 accuracy of 78.9%, for a +1.4 percent accuracy gain over the baseline model trained with the original labels.

Even though the machine annotator was pre-trained, when the proposed ReLabel method is combined with the LabelPooling framework, it drastically decreased ResNet-50 training time from 328 GPU hours to only 10 GPU hours.