Large language models can swiftly adapt to new tasks utilizing in-context learning by being given a few demos and real language instructions. This avoids hosting the LLM or annotating big datasets, but it has major performance issues with multistep reasoning, math, having the most recent information, and other things. Recent research suggests giving LLMs access to tools to facilitate more sophisticated reasoning stages or challenging them to emulate a chain of reasoning for multistep reasoning to alleviate these constraints. Nevertheless, it is challenging to adapt established approaches for a chained reason with tool usage to new activities and tools; this requires fine-tuning or prompt engineering specialized for a particular activity or tool.
Researchers from University of Washington, Microsoft, Meta, University of California and Allen Institue of AI research develop the framework Automated Reasoning and Tool usage (ART), which automatically creates decompositions (multistep reasoning) for examples of new tasks, is presented in this study. ART pulls examples of similar tasks from a task library to allow a few-shot breakdown and tool usage for further work. These examples use a flexible yet structured query language that makes it simple to read intermediate stages, pause creation to use external tools, and restart it once the output of those tools has been included (Figure 1). Also, the framework chooses and employs the best suitable tools (such as search engines and code execution) at each stage.
The LLM receives demos from ART on how to break down instances of various related activities and how to choose and employ any tool from the tool library portrayed in these examples. This helps the model generalize from examples to break down new tasks and utilize the right tools for the job, zero-shot. Also, users may update the task and tool libraries and add recent examples as needed to correct any errors in the logic chain or add new tools (e.g., for the task at hand).
They create a task library for 15 BigBench tasks and test ART on 19 BigBench test tasks that haven’t been seen before, 6 MMLU tasks, and numerous tasks from relevant tool usage research (SQUAD, TriviaQA, SVAMP, MAWPS). For 32 out of 34 BigBench problems and all MMLU tasks, ART regularly matches or surpasses computer-created CoT reasoning chains, on average, by over 22 percentage points. When tools are allowed, performance on test tasks increases by an average of around 12.3 percentage points compared to when they are not.
On average, ART outperforms direct few-shot prompting on both BigBench and MMLU tasks by 10.8% percentage points. ART outperforms direct few-shot prompting on unseen tasks demanding mathematical and algorithmic reasoning by 12.5% and outperforms the best-known GPT3 findings, including supervision for decomposition and tool usage, by 6.1% percentage points. Updating task and tool libraries with new examples allows for human interaction and enhancement of the reasoning process, making it incredibly simple to boost performance on any given job with minimal human input. On 12 test tasks, ART outperforms the best-known GPT3 results by an average of over 20% points when given extra human feedback.
Check out the Paper and Project Page. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 16k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.