MIT Researchers Introduce a Machine Learning Technique that can Automatically Describe the Roles of Individual Neurons in a Neural Network with Natural Language

Even the academics who develop neural networks don’t always understand how or why they operate so well, although they can beat humans on specific tasks. However, knowing how a neural network works might help researchers predict how it would behave in practice if it is employed outside the lab to identify medical images that could assist diagnose heart issues. 

Researchers at MIT have devised a mechanism for deciphering the inner workings of black-box neural networks. Neural networks, modeled after the human brain, comprise layers of interconnected nodes, or “neurons,” that process data. The new system can automatically generate descriptions of those neurons in English or any natural language. 

For example, in a neural network trained to recognize animals in photographs, one neuron can be described as detecting fox ears.

Their scalable technique can create more accurate and specific descriptions for individual neurons than other methods.

This method may be used to audit a neural network to see what it has learned or even edit a network by detecting and then turning off unhelpful or inaccurate neurons in a new study. 

Most current strategies for understanding how a model works either explain the complete neural network or require researchers to identify notions that individual neurons may be focusing on.

MILAN (mutual-information-guided linguistic annotation of neurons) improves on previous methods because it does not require a list of concepts in advance and automatically generates natural language descriptions of all the neurons in a network. Because a neural network can contain hundreds of thousands of individual neurons, this is very significant.

Source: https://arxiv.org/pdf/2201.11114.pdf

MILAN generates descriptions of neurons in neural networks trained to perform object detection and image synthesis in computer vision. The system examines its behavior across hundreds of photos to determine which image regions the neuron is most active in to define a neuron. It then chooses a plain language description for each neuron to optimize a quantity known as the pointwise mutual information between picture areas and descriptions. As a result, descriptions highlighting each neuron’s unique role within the more comprehensive network are encouraged. 

MILAN created more detailed and more accurate descriptions when compared to other models. Still, the researchers were more interested in understanding how it may help answer particular concerns concerning computer vision models. 

First, MILAN determines which neurons in a neural network are the most significant. Descriptions for each neuron were created and then sorted according to the reports’ words. Gradually eliminating neurons from the network showed how the accuracy altered and discovered that neurons with two highly different words in their descriptions (for example, vases and fossils) were less significant to the network. 

MILAN was also used to audit models to see if they discovered anything surprising. The researchers used MILAN to count how many neurons were sensitive to human faces despite image classification models trained on datasets with human faces blocked out.

In a third experiment, the researchers utilized MILAN to alter a neural network by identifying and eliminating neurons detecting incorrect data correlations, resulting in a 5% boost in the network’s accuracy on inputs with the problematic correlation.

While the researchers were impressed by MILAN’s performance in these applications, the model occasionally provides descriptions that are still too imprecise or make an incorrect guess when it doesn’t understand the notion it’s supposed to recognize. 

The aim is to keep improving the richness of the descriptions MILAN can produce. Since neurons work together to create an output, they intend to adapt MILAN to different neural networks and utilize it to define what groups of neurons do. This approach to interpretability begins at the bottom and works its way up. The idea is to use natural language to produce open-ended, compositional function descriptions.

It is believed MILAN — and the use of language as an explanatory tool in general — will be a valuable addition to the toolkit. For a further read, refer to the paper

Paper: https://arxiv.org/pdf/2201.11114.pdf

Reference: https://news.mit.edu/2022/explainable-machine-learning-0127