A New AI Research From Stanford Presents an Alternative Explanation for Seemingly Sharp and Unpredictable Emergent Abilities of Large Language Models

Researchers have long explored the emergent features of complex systems, from physics to biology to mathematics. Nobel Prize-winning physicist P.W. Anderson’s commentary “More Is Different” is one notable example. It makes the case that as a system’s complexity rises, new properties may manifest that cannot (easily or at all) be predicted, even from a precise quantitative understanding of the system’s microscopic details. Due to discoveries showing large language models (LLMs), such as GPT, PaLM, and LaMDA, which may demonstrate what is known as “emergent abilities” across a variety of tasks, emerging has lately attracted a lot of interest in machine learning. 

It was recently and succinctly stated that “emergent abilities of LLMs” refers to “abilities that are not present in smaller-scale models but are present in large-scale models; thus, they cannot be predicted by simply extrapolating the performance improvements on smaller-scale models.” The GPT-3 family may have been the first to find such emergent skills. Later works emphasized the discovery, writing that “performance is predictable at a general level, performance on a specific task can sometimes emerge quite unpredictably and abruptly at scale”; in fact, these emergent abilities were so startling and remarkable that it was argued that such “abrupt, specific capability scaling” should be considered one of the two main defining features of LLMs. Additionally, the phrases “sharp left turns” and “breakthrough capabilities” have been employed. 

These quotations identify the two characteristics distinguishing emerging skills in LLMs: 

1. Sharpness, changing from absent to present ostensibly instantly 

2. Unpredictability, transitioning at model sizes that appear to be improbable. These newly discovered skills have attracted a lot of interest, leading to inquiries like What determines which abilities will emerge? What determines when skills will manifest? How can they ensure that desirable talents always emerge while accelerating the emergence of undesirable ones? The relevance of these issues for AI safety and alignment is highlighted by emergent abilities, which warn that bigger models may one day, without notice, possess unwanted mastery over hazardous skills. 

Researchers from Stanford look at the idea that LLMs contain emergent abilities more precisely, abrupt and unanticipated changes in model outputs as a function of model scale on particular tasks in this study. Our skepticism stems from the finding that emerging skills seem limited to measures that discontinuously or nonlinearly scale the per-token error rate of any model. For instance, they demonstrate that on BIG-Bench tests, > 92% of emerging talents fall under one of two metrics: Multiple Options. If the choice with the highest probability is 0, grade def = 1; otherwise. If the output string perfectly matches the target string, then Exact String Match def = 1; else, 0. 

This raises the possibility of a different explanation for the emergence of LLMs’ emergent abilities: changes that appear abrupt and unpredictable may have been brought on by the researcher’s measurement choice. Despite the model family’s per-token error rate changing smoothly, continuously, and predictably with increasing model scale, this raises the possibility of another explanation. 

They specifically claim that the researcher’s choice of a metric that nonlinearly or discontinuously deforms per-token error rates, the lack of test data to accurately estimate the performance of smaller models (resulting in smaller models appearing wholly incapable of performing the task), and the evaluation of too few large-scale models are all causes of emergent abilities being a mirage. They provide a straightforward mathematical model to express their alternate viewpoint and show how it statistically supports the evidence for emergent LLM skills. 

Following that, they put their alternate theory to the test in three complementary ways: 

1. Using the InstructGPT / GPT-3 model family, they formulate, test, and confirm three predictions based on their alternative hypotheses. 

2. They conduct a meta-analysis of previously published data and demonstrate that emergent skills only occur for certain metrics and not for model families on tasks (columns) in the space of task metric-model family triplets. They further demonstrate that altering the measure for outputs from fixed models vanishes the emergence phenomena. 

3. They illustrate how identical metric choices may produce what appear to be emergent skills by purposefully inducing emergent abilities in deep neural networks of various architectures on various vision tasks (which, to the best of their knowledge, have never been proved).

Check out the Research Paper. Don’t forget to join our 20k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

🚀 Check Out 100’s AI Tools in AI Tools Club

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...