Researchers Propose Mitigation Strategies to Tackle Overinterpretation of Deep Learning Methods


While neural networks can be trained to learn, understanding a machine’s decision-making process is challenging. According to MIT researchers, image classifiers are generally graded on their test set accuracy, but high accuracy can disguise a subtle sort of model failure. Even without semantically meaningful information, these models have high accuracy ratings. The classifier is referred to as “Overinterpreted” when it makes a high-confidence decision without any significant supporting input information.

Researchers from MIT put together a paper illustrating that CIFAR-10 and ImageNet-trained neural networks suffer from overinterpretation. Although 95 percent of the input images are obscured, and humans cannot detect significant characteristics in the remaining pixel-subsets, CIFAR-10 models generate confident predictions. 

The research team presents batched Gradient SIS, a new method for discovering suitable input subsets for complicated datasets. This method is used in ImageNet to demonstrate the appropriateness of border pixels for training and testing. Overinterpretation, unlike adversarial instances, relies on total picture pixels. Overinterpretation can be mitigated through ensembling and input dropout.


Model ensembling improves classification performance. The criterion for determining how much ensembling may alleviate overinterpretation is directly the increase in SIS subset size, as pixel-subset size is closely connected with human pixel-subset classification accuracy. As expected, homogeneous assembly improves test accuracy while simultaneously increasing the SIS size, reducing overinterpretation. The ensemble’s SIS subsets are more significant than the SIS of its members.


Both the train and test images are subjected to input dropout. Each input pixel is kept with a probability of p = 0.8, while dropped pixels’ values are set to zero. According to theory, random dropout of input pixels interrupts spurious signals, leading to overinterpretation. There is a modest decrease in CIFAR-10 test accuracy for models regularized using input dropout. On CIFAR-10-C pictures, there is a considerable (6%) increase in OOD test accuracy.

However, it is observed that overinterpretation is a statistical trait of standard benchmarks and that this mitigation is not a substitute for better training data. Surprisingly, the number of pixels in the SIS justification for a given classification is typically a good indicator of whether the image is accurately classified.

The research led by the scientists of MIT proves that one can recover nonsensical features from CIFAR-10 and ImageNet just based on intuition. In datasets, they cling onto non-salient but statistically valid signals. Overinterpretation is revealed as a pathology in the study, and it is claimed that it is a prevalent failure mode of machine learning models.

In addition, by masking over 90% of each image, the researchers developed a pipeline for detecting overinterpretation. This was done through human accuracy tests to demonstrate minimal loss of test accuracy and confirm the lack of saliency in these patterns.

Misclassifications frequently rely on smaller, specious feature subsets, making overinterpretation a severe practical concern. To eliminate overinterpretation artifacts, the answer is to select the training data carefully. This could imply creating datasets in a more controlled setting. It could potentially lead to models being trained with uninformative background objects. 

After studying many datasets, it has become clear that machine learning approaches alone are incapable of resolving the problem of overinterpretation.