MIT Researchers Create ‘ExSum’: A Mathematical Framework To Evaluate Explanations Of Machine Learning Models And Quantify How Well People Understand Them

This Article Is Based On The Research Paper 'EXSUM: From Local Explanations to Model Understanding'. All Credit For This Research Goes To The Researchers 👏👏👏

Please Don't Forget To Join Our ML Subreddit

Machine learning is frequently referred to as a “black box” because the interactions between input and output become increasingly opaque as the model’s complexity grows. People’s understanding of these models is confined to how data is entered and final choices are made, and there is not much clarity in how these models make their predictions. While previous research has been done to determine how accurate the explanations given by these models are, the question of how quickly and reliably individuals grasp these models remains an unexplored territory. Interpretability methods are being developed to comprehend better the functioning of blackbox models, which is necessary for their reliable deployment.

As a stepping stone in this field, researchers from the Computer Science and Artificial Intelligence Laboratory and Microsoft Research have created ground-breaking research by developing a mathematical framework called explanatory summary (ExSUM) for evaluating and quantifying how well individuals understand machine learning models. ExSUM exposes different flaws in existing practice and aids in developing accurate model knowledge by identifying the model’s easily missed features. Other explanations, such as human alignment, robustness, and counterfactual minimality, are also included in the framework. The findings will be presented at the Conference of the North American Chapter of the Association for Computational Linguistics.

Understanding machine learning models can be divided into two main categories, the first one being to find a model that replicates the prior one’s predictions while employing visible reasoning processes. However, because of the complexity of current neural networks, this technique is very inadequate. The other method involves underlining terms in the text to indicate their significance in one model prediction and then extrapolating these local explanations to the overall model behavior. When a movie review is defined as having positive sentiment, one will deduce that positive phrases like “flawless” and “wonderful” are the most influential. They are more prone to believe that all positive terms contribute positively to a model’s predictions, which is not always the case.


ExSum examines a rule throughout the entire dataset and not just the single instance for which it was built. Using three different metrics, the user can utilize ExSum to see if the rule holds up. Coverage, validity, and sharpness are the criteria in question. The rule’s coverage indicates how widely it may be applied over the entire dataset. The percentage of specific examples that agree with the rule is highlighted by validity, while sharpness emphasizes how accurate the rule is. ExSum allows you to establish specific rules to understand better how a model acts. If one suspects that their model is gender discriminatory, they can design rules that give male pronouns a positive contribution and feminine pronouns a negative contribution. If the validity score for these rules is high, it suggests that the assumptions are correct and that the model is likely biased. ExSum can also disclose surprising insights into a model’s behavior. For example, when analyzing a movie review classifier, the researchers were astonished to discover that negative terms contribute more to the model’s predictions than positive phrases. According to the team, this kind of fine-grained knowledge has never been discovered before in prior state-of-the-art models. 

The team believes that researchers will reconsider how they think about machine learning model explanations by releasing its approach. They hope that researchers do not just focus on finding an accurate local explanation for their models but also consider their understandability.

They hope to expand this work in the future to include more criteria and explanation formats, such as counterfactual explanations. Furthermore, because the entire process now requires human involvement, there is an enormous opportunity to improve the framework and user interface by speeding the process so individuals can build rules faster. The National Science Foundation is also funding the research.