AI Researchers From AI21 Labs Introduce Three Novel Approaches That Specializes In Frozen Language In Multiple Tasks

This Article Is Based On The Research 'STANDING ON THE SHOULDERS OF GIANT FROZEN LANGUAGE MODELS'. All Credit For This Research Goes To The Researchers of This Project 👏👏👏

Please Don't Forget To Join Our ML Subreddit

Language models are used in various NLP tasks to predict the probability of a given sequence of words appearing in a phrase. These models use statistical and probabilistic approaches to perform prediction.  

Language models trained using huge datasets have been proven to provide great results in many NLP tasks. Current models adopt an approach called “frozen,” which leaves its weights untouched. However, these models still underperform the fine-tuning approaches, which modify these weights in a task-dependent way.

The AI21 Labs team developed three novel strategies for learning small neural modules that can specialize in a frozen language model for distinct tasks. Their paper, “Standing on the Shoulders of Giant Frozen Language Models,” presents: input-dependent prompt tuning, frozen readers, and recursive LMs. This time-saving technology beats traditional frozen-model methods and puts fine-tuned performance to the test without losing model adaptability.

Many previously published strategies enhance performance on specific tasks by training a small number of parameters around a frozen model. While these strategies can achieve fine-tuning performance for some applications, state-of-the-art performance in many practical scenarios still relies on fine-tuned models.

 On top of frozen LMs, the researchers believe that versatile natural language interfaces can be constructed. To show this, they plan to create more ambitious exterior scaffolding that can extract more from a frozen LM. The major finding is that existing frozen LM technologies are so small that there is room to grow them greatly at a low cost compared to a single transit through the massive LM.

The team focuses on two settings where the go-to standard is still fine-tuned models.

  1. Massive multitasking: asking a single model to simultaneously address many NLP tasks. The variety of existing multitasked models is fine-tuned; no frozen model method has been considered in this setting.
  2. Challenging individual tasks, in which leading methods are all fine-tuned. This includes open-domain question answering and asking a model to answer general-knowledge questions.

The researchers demonstrate that a single frozen LM can compete with current fine-tuning methods in demanding scenarios such as enormous multitasking or open-domain question answering. They do so by employing three distinct frozen-model strategies. Their findings show other benefits to using frozen LMs, including avoiding the high cost of training and serving many different specialized models for different use cases while retaining the LM’s versatility, non-forgetfulness, and extensibility.

This proposed approach offers two key advantages over multitask fine-tuned model:

1. Non-forgetfulness: LM can suffer from catastrophic forgetting when it comes to capabilities not related to these tasks, even after being fine-tuned for any multitask suite.  

Frozen LM will never forget anything because it is unchangeable.

2. Extensibility: It is crucial to retain all jobs in the model at once because there is no guarantee that the model’s performance on the original task suite will be preserved while fine-tuning the model. However, it is an expensive and impractical training method. In contrast, there is no cross-interference between capabilities in the frozen backbone when adding new capabilities as external components. 

Paper: https://arxiv.org/pdf/2204.10019.pdf