This AI Paper Unlocks the Secret of In-Context Learning: How Language Models Encode Functions into Vector Magic

In autoregressive transformer language models, a neural mechanism is identified that represents an input-output function as a compact vector known as a function vector (FV). Causal mediation analysis is applied to diverse in-context-learning tasks, revealing that a small number of attention heads transport FVs, which remain robust across various contexts, enabling task execution in zero-shot and natural text settings. FVs contain information about the output space of functions, and they can be combined to trigger new complex tasks, indicating the presence of internal abstractions for general-purpose functions in LLMs.

Researchers from Northeastern University extend the study of in-context learning (ICL) in LLMs and delve into transformers to uncover the existence of FVs. It references numerous related studies, including those on ICL prompt forms, meta-learning models, and Bayesian task inference, while drawing insights from research on the decoded vocabulary of transformers. It also leverages analyses of in-context copying behavior and employs causal mediation analysis methods developed by Pearl and others to isolate FVs.

✅ [Featured Article] LLMWare.ai Selected for 2024 GitHub Accelerator: Enabling the Next Wave of Innovation in Enterprise RAG with Small Specialized Language Models

The study investigates the existence of FVs in large autoregressive transformer language models trained on extensive natural text data. It extends the concept of ICL and explores the underlying mechanisms in transformers that give rise to FVs. Previous research on ICL, including prompt forms and scaling, informs this study. FVs are introduced as compact vector representations for input-output tasks. Causal mediation analysis identifies FVs and understands their characteristics, including robustness to context changes and semantic composition potential.

The method employs causal mediation analysis to explore FVs in autoregressive transformer language models. It conducts tests to assess if hidden states encode tasks and evaluate natural text portability by measuring accuracy in generating output. Over 40 jobs are created to test FV extraction in various settings, focusing on six representative tasks. The paper references prior research on ICL and function representations in language models.

Current research identifies FVs in autoregressive transformer language models through causal mediation analysis. FVs serve as compact task representations that are context-robust and can trigger specific procedures in diverse settings. It demonstrates strong causal effects in middle layers and is amenable to semantic vector composition for complex tasks. The approach outperforms alternative methods, emphasizing that LLMs possess versatile, internal function abstractions applicable across contexts.

The proposed approach successfully identifies the presence of FVs within autoregressive transformer language models through causal mediation analysis. These compact representations of input-output tasks demonstrate robustness across different contexts and exhibit strong causal effects in the middle layers of the language models. While FVs often contain information encoding the function’s output space, their reconstruction is more intricate. Furthermore, FVs can be combined to trigger new complex tasks, showing potential for semantic vector composition. The findings suggest the existence of internal abstractions of general-purpose functions in diverse contexts.

Future research directions include delving into the internal structure of FVs to discern their encoded information and execution contributions, their utility in complex tasks, and their potential for composability. Exploring the generalizability of FVs across various models, tasks, and layers is important. Comparative studies with other FV construction methods and investigations into their relationships with task representation techniques are needed. Furthermore, applying FVs in natural language processing tasks, such as text generation and question answering, warrants further exploration.


Check out the Paper, Github, and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on Telegram and WhatsApp.

Hello, My name is Adnan Hassan. I am a consulting intern at Marktechpost and soon to be a management trainee at American Express. I am currently pursuing a dual degree at the Indian Institute of Technology, Kharagpur. I am passionate about technology and want to create new products that make a difference.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...