Meet MAmmoTH: A Series of Open-Source Large Language Models (LLMs) Specifically Tailored for General Math Problem-Solving

Modern large language models (LLMs) rely heavily on mathematical reasoning, which is the primary focus of this work. There is a clear divide between closed-source and open-source LLMs, even with the recent progress in this area; closed-source models like GPT-4, PaLM-2, and Claude 2 dominate popular mathematical reasoning benchmarks like GSM8K and MATH, while open-source models like Llama, Falcon, and OPT fall far behind.

There are two main approaches to closing this gap: 

  • Ongoing pre-training, like with Galactica and MINERVA, which is now training an LLM on more than 100B tokens of web data linked to mathematics. Although it is computationally expensive, this method increases a model’s capacity for scientific reasoning in general. 
  • Using trained data unique to each dataset, fine-tuning methods such as rejection sampling fine-tuning (RFT) and WizardMath are used to perfect LLMs. While these methods are effective within their domain, they are not transferable to other areas of mathematics where reasoning is required.

Recent research by the University of Waterloo, the Ohio State University, HKUST, the University of Edinburgh, and IN.AI explore a lightweight, yet generalizable, math instruction-tuning technique to improve LLMs’ mathematical reasoning abilities in general (i.e., not just the fine-tuning tasks). 

Current approaches rely heavily on Chain-of-Thought (CoT) methodologies, which describe how they solve a mathematical issue in natural language steps. This method falls short when it comes to computation precision and difficult mathematical or algorithmic reasoning methods. Code-based techniques like PoT and PAL use third-party resources to streamline the math-solving procedure. 

This method recommends delegating computationally intensive tasks (such as solving quadratic equations with sympy or calculating matrix eigenvalues with numpy) to a separate Python interpreter. PoT, on the other hand, has several limitations when handling more abstract reasoning scenarios, such as commonsense reasoning, formal logic, and abstract algebra, especially in the absence of pre-existing APIs. 

To take advantage of the benefits of both CoT and PoT, the team presents a novel hybrid instruction-tuning dataset for mathematics called MathInstruct. Its primary features are:

  1. Comprehensive coverage of a variety of mathematical areas and complexity levels
  2. Hybrid CoT & PoT rationales. 

Six freshly selected and seven pre-existing datasets provide the foundation for MathInstruct’s mathematical justifications. From a modeling standpoint, the researchers train and evaluate approximately 50 unique models, with baselines ranging from 7B to 70B, to learn more about the effects of varied input-output formats and data sources. 

The resulting models show unrivaled promise as mathematical generalists. 

The researchers test MAmmoTH on a wide variety of datasets, from in-domain (IND) to out-of-domain (OOD), such as GSM8K, MATH, AQuA-RAT, and NumGLUE. These models significantly boost the efficiency of open-source LLMs in mathematical reasoning and generalize better to OOD datasets than state-of-the-art approaches. The results of the 7B model on the popular competition-level MATH dataset outperform those of WizardMath (open-source MATH SoTA) by a factor of 3.5 (35.2% vs. 10.7%), while those of the 34B MAmmoTH-Coder (tuned on Code Llama) outperform those of GPT-4 (using CoT). Both MAmmoTH and MAmmoTH-Coder, two of these models, improve upon the accuracy of previously available open-source models by significant margins.

Check out the Paper, Github, and ProjectAll Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

Dhanshree Shenwai is a Computer Science Engineer and has a good experience in FinTech companies covering Financial, Cards & Payments and Banking domain with keen interest in applications of AI. She is enthusiastic about exploring new technologies and advancements in today’s evolving world making everyone's life easy.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...