Google AI Introduces Minerva: A Natural Language Processing (NLP) Model That Solves Mathematical Questions

Large language models are widely adopted in a range of natural language tasks, such as question-answering, common sense reasoning, and summarization. These models, however, have had difficulty with tasks requiring quantitative reasoning, such as resolving issues in mathematics, physics, and engineering.

Researchers find quantitative reasoning an intriguing application for language models as they put language models to the test in various ways. The ability to accurately parse a query with normal language and mathematical notation, remember pertinent formulas and constants and produce step-by-step answers requiring numerical computations and symbolic manipulation are necessary for solving mathematical and scientific problems. Therefore, scientists have believed that machine learning models will require significant improvements in model architecture and training methods to solve such reasoning problems. 

A new Google research introduces Minerva, a language model that uses sequential reasoning to answer mathematical and scientific problems. Minerva resolves such problems by providing solutions incorporating numerical computations and symbolic manipulation.

Their findings demonstrate that performance on a range of challenging quantitative reasoning tasks improves significantly by concentrating on gathering training data pertinent for quantitative reasoning challenges, training models at scale, and utilizing best-in-class inference approaches. 

The researchers trained Minerva on a 118GB dataset of scientific papers from the arXiv preprint service and web pages with mathematical expressions in LaTeX, MathJax, or other formats. The model maintains the symbols and formatting information in the training data as crucial to the semantic meaning of mathematical equations. This allows the model to communicate using conventional mathematical notation.

In order to more effectively answer mathematical problems, Minerva also uses contemporary prompting and grading procedures. These include majority voting and chain of thought or scratchpad. Like most language models, Minerva gives probabilities to several potential results. It generates several answers by stochastically sampling all potential outcomes while answering a question. Although the stages in these methods are different, they frequently lead to the same conclusion. Minerva then selects the most frequent solution as the final answer by employing majority voting.


The researchers examined Minerva on STEM benchmarks ranging in difficulty from grade school level challenges to graduate-level coursework testing its numeric reasoning skills. These benchmarks included:

  • Problems from high school math competitions
  • MMLU-STEM, a subset of the Massive Multitask Language Understanding benchmark focusing on STEM subjects at the high school and college levels, including engineering, chemistry, math, and physics.
  • GSM8k that includes basic arithmetic operations used in grade school math problems
  • OCWCourses, a set of college- and graduate-level challenges from MIT OpenCourseWare that encompass a range of STEM subjects like solid-state chemistry, astrophysics, differential equations, and special relativity.

Their findings show that Minerva consistently produces cutting-edge outcomes, sometimes significantly.

As stated in their recent article, the team highlights that their strategy for reasoning quantitatively is not based on formal mathematics. With no clear underlying mathematical structure, Minerva parses queries and produces replies using a combination of natural language and LaTeX mathematical expressions. According to them, the method’s inability to automatically verify the model’s responses is a significant drawback. Even when the ultimate result is known and verifiable, the model may use flawed reasoning processes that cannot be automatically identified to reach the final response.

Machine learning models are excellent tools in many scientific fields, yet they are frequently only used to solve particular problems. The team hopes that their model capable of quantitative reasoning will help researchers and students in learning new opportunities.

This Article is written as a summary article by Marktechpost Staff based on the paper 'Solving Quantitative Reasoning Problems with
Language Models'. All Credit For This Research Goes To Researchers on This Project. Checkout the paper and blog post.

Please Don't Forget To Join Our ML Subreddit

Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Bhubaneswar. She is a Data Science enthusiast and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring the new advancements in technologies and their real-life application.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...