This AI Paper Introduces SELF-REFINE: A Framework For Improving Initial Outputs From LLMs Through Iterative Feedback And Refinement

Iterative refinement is a key aspect of human problem-solving. Iterative refinement is a process that involves making an initial draught and then improving it through self-feedback. For instance, while writing an email to a coworker to request a document, a person would first use a straightforward request like “give me the data Immediately.” But, after some thought, the author could realize that the phrase could be considered unfriendly and changed it to “Could you kindly provide me the data?” Using iterative feedback and modification, they show in this study that large language models (LLMs) can successfully mimic this cognitive process in humans.

Although LLMs are capable of producing coherent outputs in the initial stage, they frequently fall short when addressing more complex requirements, particularly for tasks with multiple objectives (such as dialogue response generation with criteria like making the response relevant, engaging, and safe) or those with less clear goals (e.g., enhancing program readability). Modern LLMs may create understandable output in such cases. Still, iterative improvement is required to guarantee that all assignment requirements are addressed and that the appropriate level of quality is attained.

Advanced methods that rely on third-party reward and supervision models call either enormous amounts of training data or expensive human annotations, which are often practical to get. These drawbacks highlight the need for a more adaptable and efficient method of text generation that may be used for many jobs with little monitoring. In this study, researchers from CMU, Allen Institute, University of Washington, NVIDIA, UCSD, and Google Research, propose SELF-REFINE overcome these constraints and better replicate the human creative production process without a costly human feedback loop. (Figure 1).

Figure 1: The first step of SELF-REFINE is to take an originally created output (0) and give it back to the same model M (1) to receive feedback (2). The feedback on the initial output is then provided back to the model (3), which repeatedly refines (0) the first generated output. Without the aid of humans, SELF-REFINE is instantiated using a potent language model like GPT-3.5.

The two halves of SELF-REFINE—FEEDBACK and REFINE—work together in an iterative cycle to produce high-quality results. They transmit the same model M (1), an initial draught output produced by model M (0), to receive feedback (1). The same model (3) is given feedback on the original production, which iteratively improves (0) the output that was initially produced. Iteratively repeating this procedure continues until the model deems no additional improvement is required, at which point the process ends. The central thesis of this study is that in a few-shot situation, the same underlying language model handles feedback and refining.

SELF-REFINE provides the first iterative strategy to enhance generation utilizing NL feedback effectively. 

Figure 1 depicts the procedure in an example. They use SELF-REFINE to complete various tasks that span many domains and call for feedback and revision techniques, such as review rewriting, acronym creation, restricted generation, narrative generation, code rewriting, response generation, and toxicity elimination. Their core components are instantiated using a few-shot prompting strategy, which enables us to use a few instances to jumpstart the model’s learning. Their iterative approach, which includes experiments, component analysis, a variety of tasks, the generation of useful feedback, and stopping criteria, is intended to guide future research in this field. 

Their contributions, in brief, are: 

  1. To help LLMs do better on a variety of tasks, they suggest SELF-REFINE, a unique technique that enables them to enhance their results using their feedback repeatedly. Unlike earlier efforts, their method requires a single LLM, which uses reinforcement learning or supervised training data. 
  2. They conduct extensive experiments on seven different tasks—review rewriting, acronym generation, story generation, code rewriting, response generation, constrained generation, and toxicity removal—and show that SELF-REFINE performs at least 5% better—and sometimes up to more than 40% better—than a direct generation from powerful generators like GPT-3.5 and even GPT-4.

Check out the Paper, Code and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 18k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...