Do Large Language Models (LLMs) Relearn from Removed Concepts?

In the advancing field of Artificial Intelligence (AI) and Natural Language Processing (NLP), understanding how language models adapt, learn, and retain essential concepts is significant. In recent research, a team of researchers has discussed neuroplasticity and the remapping ability of Large Language Models (LLMs).

The ability of models to adjust and restore conceptual representations even after significant neuronal pruning is referred to as neuroplasticity. After pruning both significant and random neurons, models can achieve high performance again. This contradicts the conventional idea that eliminating important neurons would result in permanent performance deterioration.

A new study has emphasized the importance of neuroplasticity in relation to model editing. Although model editing aims to eliminate unwanted conceptions, neuroplasticity implies that these concepts can resurface after retraining. Creating models that are safer, more equitable, and more in line requires an understanding of how ideas are represented, redistributed, and reclaimed. Understanding the process of recovering removed concepts can also improve language models’ resilience.

The study has shown that models can swiftly recover from pruning by moving sophisticated concepts back to previous layers and redistributing trimmed concepts to neurons that share comparable semantics. This implies that LLMs have the ability to integrate both new and old concepts within a single neuron, which is a phenomenon known as polysemantic capabilities. Though neuron pruning improves the interpretability of model concepts, the findings have highlighted the difficulties in permanently eliminating concepts to increase model safety.

The team has also emphasized the significance of tracking the reemergence of concepts and creating strategies to prevent the relearning of risky notions. This becomes essential to guarantee stronger model editing. The study has highlighted how idea representations in LLMs remain flexible and resilient even if certain concepts are eliminated. Gaining this understanding is essential to improving language models’ safety and dependability as well as the field of model editing.

The team has summarized their primary contributions as follows.

  1. Quick Neuroplasticity: After a few retraining epochs, the model quickly demonstrates neuroplasticity and resumes performance.
  1. Concept Remapping: Neurons in previous layers are effectively remapped to concepts excised from later layers.
  1. Priming for Relearning: After first capturing similar concepts, neurons that recovered pruned concepts may have been primed for relearning.
  1. Polysemantic Neurons: Relearning neurons demonstrate polysemantic qualities by combining old and new ideas, demonstrating the model’s capacity to represent a variety of meanings.

In conclusion, the study has mainly focused on LLMs that have been optimized for named entity recognition. The team has retrained the model, induced neuroplasticity, and pruned significant concept neurons to get the model to function again. The study has looked at how the distribution of concepts shifts and studies the connection between previously linked concepts to a pruned neuron and the concepts that it retrains to learn.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our 35k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...