As deep learning models grow in size and complexity, it becomes more difficult to articulate why and how they arrive at a given result. There are several different directions that researchers are exploring to improve the interpretability of AI systems.
Attempts at mechanistic interpretability use reverse engineering neural networks to provide such explanations for the algorithms a model employs. In image classification, convolutional neural networks have found this strategy to be rather effective. Despite these accomplishments, the repertoire of methods for producing mechanistic explanations is limited and poorly understood. A significant stumbling block is that researchers must be imaginative and diligent in assessing mechanistic hypotheses.
Combining evidence from numerous ad hoc tests is the typical method for assessing mechanistic theories. Due to the high cost involved, many approaches are only tested on simplified models or very few nontrivial circuits in more realistic models.
A new DeepMind study proposes TRAnsformer Compiler for RASP (Tracr), a compiler that compiles human-readable code into the weights of a neural network to directly address the problem of insufficient ground-truth explanations. Models performing nontrivial computations with a known implementation can be developed using this method. To determine how well various interpretability tools perform, we can apply them to constructed models and then compare the resulting explanation to the actual data.
Tracr converts Restricted Access Sequence Processing (RASP) (a domain-specific programming language designed for defining transformer computations) code into weights for transformer models. The team also introduces craft, Tracr’s intermediate representation for expressing linear algebra operations in terms of named basis directions.
The researchers use RASP to investigate edge scenarios, such as data duplicated across multiple storage locations, focusing on transformer model implementations. With Tracr, it is possible to build models in which data is encoded in a known location and validate the proposed approach. They used Tracr to create models for sorting a number sequence, counting the number of tokens in an input sequence, and checking for balanced parenthesis, all of which are much simpler tasks than NLP tasks like text summarization or question answering, which are typically where decoder-only Transformer models are employed.
The researchers highlight further possible uses of Tracr beyond its current use as a tool for assessing interpretability tools. One example is compiling and using hand-coded implementations of model sections to substitute parts of a model generated by conventional training methods. It may lead to better overall model performance.
The researchers hope its adoption by the research community will help in deepening their knowledge of neural networks.
Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 14k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Bhubaneswar. She is a Data Science enthusiast and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring the new advancements in technologies and their real-life application.