With the increasing popularity of Large Language Models (LLMs), new research and advancements are getting introduced almost every day. Using deep learning technologies and the power of Artificial Intelligence, LLMs are continuously evolving and spreading in every domain. LLMs are trained on massive amounts of raw text, and in order to enhance their performance, these models are fine-tuned. During the process of fine-tuning, LLMs are trained on particular tasks using direct training signals that measure their performance, such as classification accuracy, question answering, document summarization, etc.
Recently, a new fine-tuning paradigm called LETI (Learn from Textual Interactions) has been introduced, which dives into the potential that Large Language Models can learn from textual interactions & feedback. LETI enables language models to understand not just if they were wrong but why they are wrong. This approach enables LLMs to surpass the limitations of learning solely from labels and scalar rewards.
The team of researchers behind the development of LETI has mentioned how this approach provides textual feedback to the language model. It helps check the correctness of the model’s outputs with the help of binary labels and identifies and explains errors in its generated code. The LETI paradigm is just like the iterative process of software development, which involves a developer writing a program, testing it, and improving it based on feedback. Similarly, LETI fine-tunes the LLM by providing textual feedback that pinpoints bugs and errors.
During the fine-tuning process, the model is prompted with a natural language problem description, followed by which it generates a set of solutions. A Solution Evaluator then evaluates these solutions using a set of test cases. The researchers used a Python interpreter to use the error messages and stack traces obtained from the generated code as the source of textual feedback. The Solution Evaluator is that Python interpreter.
The training data used for fine-tuning the model consists of three components: natural language instructions, LM-generated programs, and textual feedback. When the generated program is unable to provide a solution, feedback is provided to the LLM. Otherwise, a reward token is provided to the model in the form of binary feedback to encourage it to generate an accurate solution. The generated textual feedback is used in the fine-tuning process of the LM, known as Feedback-Conditioned Fine-Tuning.
For the evaluation process, the researchers have used a dataset of code generation tasks called the MBPP (Multiple Big Programming Problems) datasets. The results have shown that LETI significantly improves the performance of two base LMs of different scales on the MBPP dataset without requiring ground-truth outputs for training. On the HumanEval dataset, LETI achieves a similar or better performance than the base LMs on unseen problems. Moreover, researchers have found that, as compared to binary feedback, using textual feedback allows the model to achieve the same performance but with fewer gradient steps.
In conclusion, LETI is a great approach for fine-tuning which enhances language models by using detailed textual feedback. It enables them to learn from mistakes and improve performance in tasks like code generation. LETI seems promising.
Check out the Paper and GitHub link. Don’t forget to join our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com
🚀 Check Out 100’s AI Tools in AI Tools Club
Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.