When it comes to downstream natural language processing (NLP) tasks, large language models (LLMs) have proven to be exceptionally effective. To generate coherent and contextually relevant responses, pioneering models like GPT4 and ChatGPT have been trained on vast volumes of text data. Their text comprehension and generation abilities make them extremely flexible for use in a wide range of NLP applications. It is commonly believed that LLMs have difficulty accurately doing complex arithmetic procedures, such as multiplying numbers with more than eight digits or performing operations involving decimals or fractions. While GPT-4 has shown outstanding capabilities across various NLP tasks, it may not demonstrate the same degree of proficiency in mathematical thinking.
Researchers from Tsinghua University, TAL AI Lab, and Zhipu.AI investigate the mathematical skills of LLMs in an effort to dispel these false beliefs. Their recent work suggests MathGLM, a robust model carefully constructed to execute a broad spectrum of difficult arithmetic operations. It achieves the best performance comparable to industry-leading LLMs like GPT-4. Addition, subtraction, multiplication, division, and exponentiation are all examples of arithmetic operations, as is the use of brackets to combine several types of arithmetic. They carry out “1-atomic operation” procedures, which are carried out singly, without being integrated with other procedures. Most notably, MathGLM can easily perform arithmetic operations with any number type, whether integers, decimals, fractions, percentages or even negative numbers.
The Ape210K dataset collects math word problems from all over the Internet and provides a comprehensive source of mathematical difficulties. This dataset helps train MathGLM because it has various issue types. The original dataset is unique in that it contains answers that were explicitly calculated. However, the team highlights that one possible consequence of MathGLM’s no-frills approach to presenting answers is that it may fail to recognize important underlying computation principles and patterns.
The researchers use the step-by-step approach to reconstruct the Ape210K dataset to get over this possible shortcoming and improve MathGLM’s ability to solve math word problems. MathGLM can create answers to math word problems with high accuracy by breaking down the complex arithmetic calculation process into a series of sequential phases.
Its extensive trials and in-depth analysis demonstrate MathGLM’s superior mathematical reasoning over GPT-4. MathGLM delivers an impressive absolute gain of 42.29% in answer accuracy compared to fine-tuning on the original dataset. MathGLM’s performance on a 5,000-case math word problems dataset is very close to GPT-4 after being fine-tuned from the GLM-10B. By breaking down arithmetic word problems into their constituent steps, MathGLM can fully comprehend the intricate calculation process, learn the underlying calculation rules, and produce more reliable results.
These findings greatly challenge the conventional wisdom that LLMs cannot handle difficult arithmetic tasks, hence revealing their exceptional ability to thrive in mathematical thinking.
Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Dhanshree Shenwai is a Computer Science Engineer and has a good experience in FinTech companies covering Financial, Cards & Payments and Banking domain with keen interest in applications of AI. She is enthusiastic about exploring new technologies and advancements in today’s evolving world making everyone's life easy.