HUSKY: A Unified, Open-Source Language Agent for Complex Multi-Step Reasoning Across Domains

Recent advancements in LLMs have paved the way for developing language agents capable of handling complex, multi-step tasks using external tools for precise execution. While proprietary models or task-specific designs dominate existing language agents, these solutions often incur high costs and latency issues due to API reliance. Open-source LLMs focus narrowly on multi-hop question answering or involve intricate training and inference processes. Despite LLMs’ computational and factual limitations, language agents offer a promising approach by methodically leveraging external tools to address complicated challenges.

Researchers from the University of Washington, Meta AI, and the Allen Institute for AI introduced HUSKY, a versatile, open-source language agent designed to tackle diverse, complex tasks, including numerical, tabular, and knowledge-based reasoning. HUSKY operates through two key stages: generating the next action to take and executing it using expert models. The agent uses a unified action space and integrates tools like code, math, search, and commonsense reasoning. Despite using smaller 7B models, extensive testing shows that HUSKY outperforms larger, cutting-edge models on various benchmarks. It demonstrates a robust, scalable approach to solving multi-step reasoning tasks efficiently.

Language agents have become crucial for solving complex tasks by leveraging language models to create high-level plans or assign tools for specific steps. They typically rely on either closed-source or open-source models. Earlier agents used proprietary models for planning and execution, which, while effective, are costly and inefficient due to API reliance. Recent advancements focus on open-source models, distilled from larger teacher models, offering more control and efficiency but often specializing in narrow domains. Unlike these, HUSKY employs a broad, unified approach with a straightforward data curation process, utilizing tools for coding, mathematical, search, and commonsense reasoning to address diverse tasks efficiently.

HUSKY is a language agent designed to solve complex, multi-step reasoning tasks through a two-stage process: predicting and executing actions. It uses an action generator to determine the next step and associated tool, followed by expert models to execute these actions. The expert models handle tasks like generating code, performing mathematical reasoning, and crafting search queries. HUSKY iterates this process until a final solution is reached. Trained on synthetic data, HUSKY combines flexibility and efficiency across diverse domains. It’s evaluated on datasets requiring varied tools, including HUSKYQA, a new dataset designed to test numerical reasoning and information retrieval abilities.

HUSKY is evaluated on diverse tasks involving numerical, tabular, and knowledge-based reasoning, plus mixed-tool tasks. Using datasets like GSM-8K, MATH, and FinQA for training, HUSKY shows strong zero-shot performance on unseen tasks, consistently outperforming other agents such as REACT, CHAMELEON, and proprietary models like GPT-4. The model integrates tools and modules tailored for specific reasoning tasks, leveraging fine-tuned models like LLAMA and DeepSeekMath. This enables precise, step-by-step problem-solving across domains, highlighting HUSKY’s advanced capabilities in multi-tool usage and iterative task decomposition.

In conclusion, HUSKY is an open-source language agent designed to tackle complex, multi-step reasoning tasks across various domains, including numerical, tabular, and knowledge-based reasoning. It uses a unified approach with an action generator that predicts steps and selects appropriate tools, fine-tuned from strong base models. Experiments show HUSKY performs robustly across tasks, benefiting from domain-specific and cross-domain training. Variants with different specialized models for code and math reasoning highlight the impact of model choice on performance. HUSKY’s flexible and scalable architecture is poised to handle increasingly diverse reasoning challenges, providing a blueprint for developing advanced language agents.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 44k+ ML SubReddit