Quit Emailing Yourself

AlphaProof Paper

7 min read | Saved February 14, 2026 | Copied!

alphaproof 🤖 mathematical-olympiad 🤖 lean 🤖 reinforcement-learning 🤖 proofs 🤖

Do you care about this?

This article details the development of AlphaProof, a system that uses reinforcement learning and the Lean programming language to automate the discovery of mathematical proofs. It highlights the success of AlphaProof in solving problems from the International Mathematical Olympiad 2024, including a challenging proof that only a few human participants achieved.

If you do, here's more

AlphaProof is a system developed to uncover Lean proofs for the International Mathematical Olympiad (IMO), particularly highlighted by a silver medal performance in 2024. The IMO is a prestigious annual competition featuring the top six high-school math students from over 100 countries. Contestants solve complex problems across various mathematical fields, with only a small percentage achieving top scores. In 2024, the gold medal cutoff was set at 29 points, reflecting the difficulty of certain problems that year.

The Lean programming language plays a crucial role in AlphaProof, allowing for the creation of verifiably correct proofs. Lean's structure enables users to write concise proofs that resemble both traditional mathematics and programming languages. The system uses a model trained on a curated dataset of code and mathematical data, focusing on next-token prediction and span reconstruction tasks. This approach yielded a robust model capable of generating Lean proofs effectively after extensive pretraining.

AlphaProof employs a tree search method akin to AlphaZero, optimizing the proof process by dividing complex goals into smaller sub-goals. This method increases efficiency by allowing the system to focus on the hardest sub-proofs first. The introduction of product nodes into the search tree allows for simultaneous tackling of multiple sub-goals, which enhances the scaling of the search process. For reinforcement learning, the system formalizes natural language theorem statements into Lean, significantly increasing the dataset for training. This stochastic process generates numerous formalizations, which, even if imperfect, still provide valuable training data for discovering valid mathematical proofs.

Questions about this article

No questions yet.