Latest MIT Research To Test Whether Popular Methods For Understanding Machine Learning Models Are Working Correctly

Image classification refers to the practice of categorizing or labeling pixels or vectors within an image based on specific rules. In many cases, even when the machine-learning model appears to be working well, it could be focusing on visual attributes that are unintentionally connected with particular objects rather than genuine indicators of that object.

To overcome this, many researchers employ “feature-attribution methods” to test these models. This involves procedures that are supposed to tell them which image elements are the most relevant for the neural network’s prediction. But, the researchers have no means of knowing if their evaluation process is ineffective because they don’t know which features are significant, to begin with.

To address this issue, MIT researchers have introduced a method for modifying the original data to know which aspects are genuinely significant to the model. They then test whether feature-attribution algorithms can appropriately identify those essential features using this changed dataset.

They discovered that even the most popular approaches frequently miss significant elements in images and that some methods barely outperform a random baseline. This could have far-reaching consequences, mainly if neural networks are used in high-stakes scenarios such as medical diagnosis. In most cases, human experts may be unaware that the erroneous model is misleading them if the network isn’t running effectively and attempts to detect abnormalities aren’t performing correctly.

Therefore, to use these feature-attribution methods to prove that a model is correct, one should make sure that the feature-attribution method is valid in the first place.

Each pixel in a picture represents a feature that the neural network may use to create predictions; thus, there are millions of features it can focus on in image categorization. For example, a model can be trained to recognize images taken by skilled photographers from those shot by casual tourists if they want to create an algorithm to help aspiring photographers better. This model could be used to determine how closely amateur photographs resemble professional photographs, as well as provide precise recommendations on how to improve them. During training, researchers would like this model to identify creative qualities in professional images, such as color space, composition, and postprocessing. However, a professionally shot photo is likely to have a watermark of some sort.

It’s tempting to examine the model using feature attribution methods, but there’s no guarantee that they’ll work because it could use artistic features, the watermark, or any other characteristic.

There is no way to find out those erroneous correlations in the dataset. Even if those features aren’t visible to humans, a neural network can probably extract them and utilize them to categorize them.

Therefore, the researchers altered the dataset to lessen all correlations between the original image and the data labels, ensuring that none of the actual attributes would matter. Then they added a new feature to the image that is so visible that the neural network must focus on it to predict, such as bright rectangles of various colors for various image classifications.

They used this method to test a variety of feature-attribution strategies. These approaches provide a saliency map for image classifications, illustrating the concentration of key features dispersed throughout the entire image. If the neural network is used to identify photos of birds, the saliency map may reveal that 80% of the significant features are clustered around the bird’s beak.

They modified the images in a variety of ways after removing all the correlations in the image data, such as obscuring areas of the image, altering the brightness, or adding a watermark. If the feature-attribution procedure is working correctly, nearly all of the significant features should be found in the vicinity of the researchers’ manipulation.

All of the feature-attribution algorithms they looked at were better at detecting anomalies than they were at detecting the absence of abnormalities. In other words, these approaches could detect the presence of a watermark more easily than they could detect the absence of a watermark. In this situation, humans would have a harder time trusting a model that predicts a negative outcome.

The team’s work demonstrates the importance of thoroughly testing feature-attribution approaches before applying them to a real-world model, particularly in high-stakes circumstances.
The researchers plan to apply this review process in the future to look into other subtle or realistic aspects that could lead to false associations. Another aspect of research they want to look at is assisting humans in comprehending saliency maps to make better decisions.

Paper: https://arxiv.org/pdf/2104.14403.pdf

Reference: https://news.mit.edu/2022/test-machine-learning-models-work-0118