Unlocking the Mysteries of Large Language Models: A Deep Dive into Influence Functions and Their Scalability

Large language models (LLMs) have accelerated development in various real-world fields and have shown unexpected emergent skills, including in-context learning and chain-of-thought reasoning. However, this development has several dangers, from short-term concerns like social biases, data leaks, and disinformation to longer-term dangers posed by potent AI systems. As a function of both scale and degree of fine-tuning, it has also been demonstrated that LLMs alter along various psychological and behavioral aspects. To navigate these dangers, it is necessary to have insight into the models’ workings. 

Is an LLM merely repeating (or splicing together) passages from the training set when it produces information it knows to be untrue, properly solves arithmetic or programming problems, or begs the user not to shut it down? Or is it creating new connections amongst its knowledge reserves while constructing a comprehensive global model? Different responses to these issues will significantly affect predictions of the development of AI capabilities and strategies for integrating AI systems with human preferences. Reverse engineering a model’s circuitry in detail is one bottom-up method to acquire insight into it. 

Induction heads, a process for replicating behavior, and other mechanisms by which the model might learn uninterpretable superpositions of characteristics have been discovered in mechanistic interpretability. Researchers have proposed techniques by which Hopfield networks, rapid weights, sparse regression, gradient descent, automata, or straightforward computer programs could be implemented via transformers. Although these evaluations provide insightful information, they are often done on tiny, simplified systems. It would probably need extensive reverse engineering of a complicated calculation involving billions of parameters to connect them to the high-level phenomena that fascinate us about LLMs. 

As an alternative, they might start with the input-output relationships of the model and zoom in. The benefit of this is that one may use big models to explore phenomena of interest directly. Unfortunately, making strong judgments based on model samples and probabilities is challenging since any given outcome is consistent with a wide range of learning processes, from straightforward memorizing to original problem-solving. They go beyond basic possibilities and samples to further the top-down methodology. They seek to quantify the counterfactual: How would the model behave if the training set included a particular sequence? Influence functions, a traditional statistical method incorporated into deep learning, address this counterfactual concern. Influence functions specifically seek to approach a tiny representation of this counterfactual. Researchers from the University of Toronto and Vector Institute are analyzing large language model generalizations using influence functions.

They believe that this is a crucial source of evidence for virtually any high-level behavior that they are interested in understanding; by identifying the training sequences that have a significant impact, they can distinguish between various explanations for how the output was produced and shed light on the types of structure that are or are not generalized from training examples. While impact functions have provided some small-scale neural networks with insights, scaling them to big models is challenging. Computing an inverse-Hessian-vector product (IHVP) is one of the computational bottlenecks; this generally involves running an iterative linear system solver for potentially thousands of steps, each comparable to the cost of a gradient computation. 

The requirement to compute the gradients of all the training instances under consideration, which normally has to be done independently for each influence query, is another bottleneck. To date, 300 million parameter vision transformers have been the biggest models to which influence functions have been applied. They offer a method for scaling impact function calculations to huge transformer language models (up to 52 billion parameters are investigated). Their strategy is built on cutting-edge techniques for the training gradient computation and IHVP calculation, the two previously noted computing bottlenecks. 

They list some of their key conclusions as follows: 

1. Despite being substantially quicker, EK-FAC is competitive with the more established LiSSA method regarding influence estimation accuracy. 

2. The influence distribution has a strong tail, and the tail of the distribution generally follows a power law. Instead of focusing on a small number of sequences, the effect is distributed throughout many of them, suggesting that typical model behaviors are not the direct result of memorizing a small number of sequences. 

3. Compared to smaller models, larger models consistently generalize at a higher degree of abstraction. Role-playing, programming, mathematical reasoning, and cross-linguistic generalization are some examples. 

4. Influence is generally evenly dispersed throughout the network’s various tiers. However, multiple levels exhibit distinct generalization patterns, with the intermediate layers concentrating on more abstract patterns while the upper and lower layers are more closely related to tokens. 

5. The effect functions demonstrate an unexpected sensitivity to word order despite the advanced generalization patterns seen overall. Particularly, training sequences only have a meaningful impact when words associated with the prompt come before those associated with the completion. 

6. Examples or descriptions of comparable behaviors in the training set had the greatest effect on role-playing behavior, showing that imitation rather than complex planning is the cause of the behaviors.


Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...