Can LLMs Generate Mathematical Proofs that can be Rigorously Checked? Meet LeanDojo: An Open-Source AI Playground With Toolkits, Benchmarks, and Models for Large Language Models to Prove Formal Theorems in the Lean Proof Assistant

Artificial Intelligence and Machine Learning are the trending fields of today’s time. With the immense progress being made in AI, new innovations are transforming the way humans interact with machines. Reasoning in human intelligence is a significant part of Artificial Intelligence. A number of theorems-proving approaches have been researched, such as Automated theorem proving (ATP), which is the process of automatically producing proofs for theorems stated in formal logic. ATP being challenging due to massive search space, Interactive theorem proving (ITP) emerged as an alternative paradigm in which human experts interact with software tools called proof assistants to construct proofs.

Large language models (LLMs), which have demonstrated remarkable code generation capabilities, also face difficulties in theorem proving due to flaws in factuality and hallucination. To overcome these limitations, a team of researchers from Caltech, NVIDIA, MIT, UC Santa Barbara, and UT Austin has introduced LeanDojo, which is an open-source toolkit for LLM-based theorem proving. LeanDojo has been built around the Lean proof assistant, which is popular among mathematicians. It offers resources for working with Lean and extracting data. 

In data extraction, training data is gathered from proof trees and intermediate proof states that are not immediately evident in the original Lean code. LeanDojo has been made capable of enabling models to communicate with Lean programmatically. This allows them to see proof states, carry out proof actions or tactics, and get feedback from Lean. The open-source Lean playground has been made up of numerous elements, including toolkits, data, models, and benchmarks, to enable programmed interaction with the proof environment and to extract data from Lean.

LeanDojo provides fine-grained annotations of premises in proofs which is valuable for premise selection, a critical bottleneck in theorem proving. By using LeanDojo’s data extraction capabilities, the researchers have also developed ReProver, the first LLM-based prover augmented with retrieval for selecting premises from a large math library. Unlike previous methods that were dependent upon private datasets requiring substantial computational resources, ReProver has been designed to be more accessible and cost-effective. It requires less computing power and can be trained with just one GPU per week.

LeanDojo’s program analysis capacity has been used by ReProver’s retrieval mechanism to find accessible premises and produce concrete examples of what may go wrong. As a result, the prover performs better, and the retrieval procedure is more effective. For evaluation and further research, the team has developed a new benchmark dataset comprising 96,962 theorems and proofs extracted from Lean’s math library. This benchmark dataset features a challenging data split that requires the prover to generalize to theorems relying on novel premises that were not used during training. The experimental results have shown that ReProver performs well as compared to non-retrieval baselines and GPT-4 when using this benchmark dataset for training and evaluation.

In conclusion, this open-source solution for LLM-based theorem proving seems promising for the future. It overcomes the barriers of private code, data, and large computing requirements by providing accessible toolkits, data, models, and benchmarks.

Check Out the Paper, Github Link, and Project Page. Don’t forget to join our 25k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at

Featured Tools:

🚀 Check Out 100’s AI Tools in AI Tools Club

Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...