Machine-Checked Mathematics in the Age of AI

Leo de Moura
Senior Principal Applied Scientist, AWS
Chief Architect, Lean FRO

Stanford Future of Mathematics Symposium | May 1, 2026

"Vibe Proving"

Alexeev & Mixon resolved a $1000 Erdős prize problem.

"We used ChatGPT to vibe code a Lean proof."

The proof is machine-checked. Paper

Erdős prize paper

What is Lean?

A programming language and a proof assistant.
Same language for definitions and proofs.
Lean is implemented in Lean. Very extensible.
The math community made Lean their own: 50,000+ lines of extensions.
Lean is scalable.
Small trusted kernel. Proofs can be exported and independently checked.

Theorem Proving Is a Game

"You have written my favorite computer game." — Kevin Buzzard

The "game board": you see goals and hypotheses, then apply "moves" (tactics).

def odd (n : Nat) : Prop := ∃ k, n = 2 * k + 1

theorem square_of_odd_is_odd : odd n → odd (n * n) := byn:Nat⊢ odd n → odd (n * n)
  intro ⟨k₁, e₁⟩n:Natk₁:Nate₁:n = 2 * k₁ + 1⊢ odd (n * n)
  simp [e₁, odd]n:Natk₁:Nate₁:n = 2 * k₁ + 1⊢ ∃ k, (2 * k₁ + 1) * (2 * k₁ + 1) = 2 * k + 1
  exists 2 * k₁ * k₁ + 2 * k₁n:Natk₁:Nate₁:n = 2 * k₁ + 1⊢ (2 * k₁ + 1) * (2 * k₁ + 1) = 2 * (2 * k₁ * k₁ + 2 * k₁) + 1
  liaAll goals completed! 🐙

Each tactic transforms the goal until nothing remains.

The kernel checks the final proof term.

The Kernel Catches Mistakes

example : odd 3 := ⟨2, by⊢ 3 = 2 * 2 + 1 decide⊢ 3 = 2 * 2 + 1Tactic `decide` proved that the proposition
  3 = 2 * 2 + 1
is false⟩

You don't need to trust me or my automation. You only need to trust the small kernel.

An incorrect proof is rejected immediately.

Lean has multiple independent kernels.

You can build your own and submit it to arena.lean-lang.org.

The State of Lean

250,000+ unique installations: VS Code (143K+) + Open VSX (107K+).

Ecosystem: 9,000+ GitHub repositories depending on Lean.

Libraries: Mathlib, PhysLean, CSLib.

Industry: AWS, Google, Microsoft, Mistral, Nethermind, Galois, others.

"Lean has become the de facto choice for AI-based systems of mathematical reasoning." — ACM SIGPLAN Award

The Lean Mathematical Library

Formal Mathematics = Machine-Checkable Mathematics.
270,000+ formalized theorems.
750+ contributors.
2.2M+ lines of Lean.
1,500+ type classes and 20,000+ instances.
Fewer than 1,000 definitions separate Mathlib from full modern-research-math coverage.

Mathlib dependency graph

"I’m investing time now so that somebody in the future can have that amazing experience." — Heather Macbeth, Prof. of Mathematics, Imperial College

The Liquid Tensor Experiment

In 2020, Peter Scholze posed a formalization challenge.

"I spent much of 2019 obsessed with the proof of this theorem, almost getting crazy over it. I still have some small lingering doubts." — Peter Scholze

Johan Commelin led a team that verified the proof, with only minor corrections.

They verified and simplified the proof without fully understanding it.

"The Lean Proof Assistant was really that: an assistant in navigating through the thick jungle that this proof is." — Peter Scholze

Nature article on LTE

Large Formalizations Need Project Infrastructure

Carleson's Theorem (completed) — van Doorn
Liquid Tensor Experiment (completed) — Commelin
Equational Theories Project, Tao (completed)
Sphere Packing — Birkbeck, Hariharan, Lee, Ma, Mehta, Viazovska
Fermat's Last Theorem — Buzzard
Inter-Universal Teichmüller Theory — Mochizuki

At this scale, mathematics needs build systems, dependency graphs, code review, release engineering, and a precise medium for communication.

Formal Math Is More Maintainable

Change a definition. Rebuild. The compiler identifies what broke.

The parts that still type-check do not need human revalidation.

Auto-generalization in Mathlib

Lean as an IDE

Real-time feedback: errors, goals, and hints as you type. The IDE is where we collaborate with AI now.

Claude Code: sum.lean with sorry

AI Proves the Theorem

AI found the proof and explained its strategy in natural language:

Claude Code: sum.lean proved

Interactive Walkthrough: Sum Formula

def sum (n : Nat) : Nat :=
  if n = 0 then
    0
  else
    n + sum (n-1)

theorem sum_eq : 2 * sum n = n * (n+1) := byn:Nat⊢ 2 * sum n = n * (n + 1)
  induction n with
  | zero =>zero⊢ 2 * sum 0 = 0 * (0 + 1) simp [sum]All goals completed! 🐙
  | succ n ih =>succn:Natih:2 * sum n = n * (n + 1)⊢ 2 * sum (n + 1) = (n + 1) * (n + 1 + 1)
    unfold sumsuccn:Natih:2 * sum n = n * (n + 1)⊢ (2 * if n + 1 = 0 then 0 else n + 1 + sum (n + 1 - 1)) = (n + 1) * (n + 1 + 1); simpsuccn:Natih:2 * sum n = n * (n + 1)⊢ 2 * (n + 1 + sum n) = (n + 1) * (n + 1 + 1)
    rw [Nat.mul_add, ih, ← Nat.add_mul]succn:Natih:2 * sum n = n * (n + 1)⊢ (2 + n) * (n + 1) = (n + 1) * (n + 1 + 1)
    rw [Nat.add_comm 2 n, Nat.mul_comm]All goals completed! 🐙

Reading vs Writing Formal Math

It is easier to read formal math than write it.
Formal math is easier to read than LaTeX.
You can click through, inspect definitions, check types.
AI can explain formal proofs and is making the writing part easier.

$Formal math in Lean$

Proof Assistants in the Age of AI

Verso: Written in Lean. Checked by Lean.

Future math papers will embed type-checked Lean code.
Lecture notes, papers, and slides — all in one tool.
These slides are written in Verso.
Every example is checked by Lean.
lean-lang.org is written in Verso.
My homepage is written in Verso.
Verso is a domain-specific language embedded in Lean.
Built by David Thrane Christiansen at the Lean FRO.

Verso: source and rendered output

Verso Blueprint: A Map for Large Formalizations

Massot's Lean Blueprints have been widely adopted in the Lean community.
Verso Blueprint is the next generation.
Large projects like FLT (50+ contributors) and Sphere Packing need more than proofs.
It tracks definitions, lemmas, and dependencies.
The blueprint is a Lean document with inline code that is type-checked.
When AI generates 200,000+ lines of formalization, humans need tools to visualize and understand what was built.
Built by Emilio Jesús Gallego Arias at the Lean FRO.
Emilio ported major projects onto it: FLT, Carleson, Sphere Packing.

verso-blueprint

Verso Blueprint: TeX and Lean in One Document

Verso Blueprint code mixing TeX and Lean

Verso Blueprint: Dependency Graph

Verso Blueprint dependency graph

Lean Verbose

Lean Verbose provides natural-language tactics to teach mathematics using Lean 4.

Playing the Lean game using mouse clicks and natural language.

Lean Verbose is implemented in Lean by Patrick Massot.

Lean Verbose

Lean Workbench (Coming Soon)

Zero installation. Manage Lean and Verso Blueprint projects in your web browser.
Pre-configured with AI.
Great for teaching. Students can focus on the course instead of the technology.
Instructors can ensure students use the correct libraries for the course.
Built by Jason Reed at the Lean FRO.

Lean Workbench

International Mathematical Olympiad (IMO)

Every medal-level IMO AI with formal proofs uses Lean.

AlphaProof (Google DeepMind) — silver medal, IMO 2024
Aristotle (Harmonic) — gold medal, IMO 2025
Seed Prover (ByteDance) — silver medal, IMO 2025

AI Startups

Leanstral (Mistral): the first open-source code agent designed for Lean 4.
Axiom: solved 12/12 problems on Putnam 2025.
DeepSeek Prover-V2: 88.9% on miniF2F. Open-source, 671B parameters.
Harmonic: built the Aristotle AI — gold medal, IMO 2025.

Startups using Lean

The Kernel as AI Safety Infrastructure

The kernel's soundness is essential for AI safety.

Reinforcement learning agents will exploit every shortcut.

Validating a Lean Proof

Who Watches the Provers?

Comparator

Comparator is a judge for Lean proofs.

A challenge is a Lean file with sorrys to be filled.

def large : Nat := sorry

theorem large_lt : 37 < large := sorry

A solution replaces the sorrys with Lean terms and proofs.

def large : Nat := 38

theorem large_lt : 37 < large := by⊢ 37 < large decideAll goals completed! 🐙

Comparator checks whether the solution is valid.

Lean Eval

Lean Eval — Built by Kim Morrison at the Lean FRO

Lean Eval leaderboard

The Lean FRO

Nonprofit. 20 engineers. Open source. Controlled by no single company.

Sebastian Ullrich and I launched it in August 2023.

29 releases and 8,500+ pull requests merged.

Lean is not a research project anymore. We run the Lean FRO as a startup.

Public roadmaps | Our team

The Lean project was featured in NY Times, Quanta, Scientific American, Wired, and others.

Scalability Is the Real Test

Today's Mathlib builds and stays interactive.

The open question is whether editor feedback remains interactive at 100x scale.

AI makes the scaling problem larger: more proof attempts, more generated code, more modules.

The proof assistant is the environment where humans and AI collaborate.

Every subsystem must scale.

Conclusion

We opened with Alexeev and Mixon resolving an Erdős prize problem — a Lean proof "vibe coded" through ChatGPT.

Formalization is not just about proving. It is about understanding, communicating, and navigating structures beyond our cognitive abilities.

AI enables more people to use and benefit from formal mathematics.

The discipline thrives when no one has to "take it on faith."

AI changes how we build proof assistants. It does not change what makes them good.

lean-lang.org | live.lean-lang.org | Lean Zulip | Reference Manual

Thank You

Leo de Moura

lean-lang.org | leanprover.zulipchat.com