Based on the previous text, large language models (LMs) like GPT-3 are trained to predict the next token. A very flexible LM that can “read” any text input and conditionally “write” text that could perhaps follow after the input is produced when this straightforward goal is combined with a sizable dataset, model, and dataset. The original GPT-3 study popularised in-context learning as a technique to employ language models to learn tasks given only a few examples.
Models that have been properly trained can translate pairs of pairs into precise predictions for brand-new inputs. ICL demands that the neural network create an implicit map from in-context examples to a prediction without changing the model’s underlying parameters.
In a new study, researchers from Google, MIT CSAIL, and Stanford University test the idea that some examples of ICL can be seen as implicit implementations of known learning algorithms. For example, in-context learners encode an implicit, context-dependent model in their hidden activations and train it on in-context examples while computing these internal activations.
In contrast to earlier studies, this study’s main goal is to understand not just what functions ICL can learn but also how it does: transformer-based ICL’s particular inductive biases and algorithmic characteristics.
They look at how transformer-based predictors work on a small group of learning problems, which in this case is linear regression. The team says linear models only need a small number of layers and hidden units to be trained with a transformer decoder.
They also look into the real-world qualities of in-context learners who have been trained. They start by making linear regression problems in which training data don’t fully explain how a learner will act (so different valid learning rules will give different predictions on held-out data).
Their study shows that existing predictors closely match model predictions and switch between different predictors as model depth and training set noise change. At large hidden sizes and depths, they behave like Bayesian predictors.
They did experiments to find out how algorithmically model predictions are made. Their results suggest that the hidden activations of in-context learners can be used to decode important intermediate quantities like parameter vectors and moment matrices that are calculated by learning algorithms for linear models.
The researchers believe that a full description of which learning algorithms deep networks use (or could use) could improve the theoretical understanding of their strengths and weaknesses and the practical understanding of how to train them best.
This study provides the foundation for such a characterization: Some in-context learning appears to utilize well-known methods that transformers learned and used just from sequence modeling problems. They also intend to delve further into the kinds of pretraining data that can support in-context learning.
Check out the Paper, Github, and Reference Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 13k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Bhubaneswar. She is a Data Science enthusiast and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring the new advancements in technologies and their real-life application.