Large Language Models have made an indelible mark on the Artificial Intelligence community. Models like GPT, T5, PaLM, etc., are exponentially becoming popular. These models imitate humans by learning to read, summarize and generate textual data. Their recent impact on AI has helped contribute to a wide range of industries like healthcare, finance, education, entertainment, etc.
Aligning Large Language Models to human values and intentions has been a constant challenge in the field of Generative AI, specifically in terms of being comprehensive, respectful, and compliant. With the immense popularity of GPT-based ChatGPT, this issue has come into the limelight. Current AI systems heavily depend on supervised fine-tuning with human instructions and annotations and reinforcement learning from human feedback (RLHF) to align the models with human preferences. However, this approach requires extensive human supervision, which is both expensive and potentially problematic. This leads to issues in quality, reliability, diversity, and undesirable biases present in human-provided annotations.
To address these issues and minimize the dependence of LLMs on intensive human annotations, a team of researchers proposed an approach called SELF-ALIGN. SELF-ALIGN has been introduced to process the aligning of LLM-based AI agents with human values, and that too virtually and annotation-free. It utilizes a small set of human-defined principles or rules to guide the behavior of the AI agents when generating responses to user queries.
The researchers have applied the SELF-ALIGN approach to the LLaMA-65b base language model. An AI assistant named Dromedary has been developed, which achieves significant performance improvements compared to the current AI systems, including Text-Davinci-003 and Alpaca, using fewer than 300 lines of human annotations. The code, LoRA weights of Dromedary, and the synthetic training data have been open-sourced to encourage further research in aligning LLM-based AI agents with enhanced supervision efficiency, reduced biases, and improved controllability.
The approach involves four stages –
1. Self-Instruct: This stage employs the self-instruct mechanism by generating synthetic instructions using 175 seed prompts and an additional 20 topic-specific prompts. The purpose of these instructions is to provide a comprehensive range of contexts and scenarios for the AI system to learn from.
2. Principle-Driven Self-Alignment: In this stage, a small set of 16 human-written principles is provided in English, outlining the desirable quality of the system-produced responses. These principles serve as guidelines for generating helpful, ethical, and reliable responses. The approach utilizes in-context learning (ICL) with a few demonstrations to illustrate how the AI system adheres to the rules when formulating responses in different cases.
3. Principle Engraving: In this stage, the original LLM is fine-tuned using the self-aligned responses generated by the LLM through prompting. During the fine-tuning process, the principles and demonstrations are pruned. This fine-tuned LLM can directly generate responses that align well with the principles.
4. Verbose Cloning: The final stage involves using context distillation to enhance the system’s ability to produce more comprehensive and elaborate responses. This technique enables the system to generate detailed and thorough responses.
In conclusion, Dromedary, the bootstrap LLM, seems promising to greatly align itself with minimal human supervision.
Check out the Paper and Github link. Don’t forget to join our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com
Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.