Researchers from ETH Zurich and Microsoft Introduce SCREWS: An Artificial Intelligence Framework for Enhancing the Reasoning in Large Language Models

Large Language Models (LLMs) have succeeded in several different reasoning tasks. To guarantee that the intended aim is met, it is sometimes required to iteratively adjust the LLM results because the output is only occasionally accurate on the first try. These refinement techniques assume that consecutive results (from the same model, an external model, or some tool) result in improved performance. However, there is no assurance that later versions will always be better as Figure 1 shows, refining might result in a false positive. This encourages the model to choose an earlier outcome using the selection technique. Furthermore, prior research on iterative refining frequently uses a single, fixed reasoning technique. But humans are more adaptable. 

Figure 1: A case study illustrative of how Conditional Resampling (also known as “refinement”) may result in improper modification of the initial response. The original response, which in this case is the right one, may be chosen by a selection module in place of the alteration.

A product manager may use a brainstorming technique to generate several ideas before switching to a prioritization technique to rank them according to their viability or effect. Similarly, a student preparing for an exam might use deductive reasoning to answer issues and inductive reasoning to confirm the results. They thus suggest a modular strategy for answering refinements, enabling us to try various tactics. In this paper, researchers from  ETH Zurich and Microsoft Semantic Machines present SCREWS, a modular framework for reasoning about changes. Sampling, Conditional Resampling, and Selection are the three core components of the architecture that are introduced in detail in Figure 2. They instantiate SCREWS by fixing the submodules for each module (for example, they may choose “Chain of Thought” for Sampling). This is done for a specific job and input sequence. 

Figure 2 presents a high-level picture of the modular SCREWS system for reasoning about revisions. The three substantial boxes (or “modules”) each contain a number of choices (or “submodules”). Many previous efforts, including Self-Refine, Least to Most, LLMs Know (Mostly), Self-Consistency, Self-Improve, PHP CoT, Self-Correct, Socratic CoT, Programme of Thoughts, and many more, may be seen as examples of the framework. (…) denotes additional sub-components that may be added to each module, including, but not limited to, cached memory or online search for the Sampling module, a fine-tuned model or an external verifier for Conditional Resampling, and selection based on humans or an oracle for the Selection module.

Sampling’s first outputs are handed on to Conditional Resampling, which determines whether to create a revision based on the original sample and does so if necessary. The Selection module then chooses the best from all the samples and revisions. Given the modular design of their framework, additional framework elements can be used to enhance several newly suggested self-refining approaches. One example is the combination of their model-based selection technique and self-refinement method, which can improve overall performance. They use ChatGPT or GPT-4 to assess SCREWS on various reasoning tasks, including multi-hop question answering, arithmetic reasoning, and code debugging. 

Compared to the standard sample and resampling procedures, their suggested solutions produce significant improvements (10–15%). They show the value of heterogeneous resampling, showing how it may influence the model’s logic and substantially improve the baselines at a very low total cost. They also explain the significance of a model-based selection approach, a crucial element of contemporary LLMs that enables the model to revert to earlier, more certain outputs.

Check out the PaperAll Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.

🚀 LLMWare Launches SLIMs: Small Specialized Function-Calling Models for Multi-Step Automation [Check out all the models]