Lean: Machine-Checked Mathematics and Verified Programming

Leo de Moura
Senior Principal Applied Scientist, AWS
Chief Architect, Lean FRO

University of Minnesota | March 30, 2026

Last Month, Something Unexpected Happened

A general-purpose AI, with no special training for theorem proving, converted zlib (a C compression library) to Lean, passed the test suite, and then proved that the code is correct.

Not tested. Proved. For every possible input. lean-zip

"This was not expected to be possible yet."

How? Let me show you.

What is Lean?

A programming language and a proof assistant
Same language for code and proofs. No translation layer.
Small trusted kernel. Proofs can be exported and independently checked.
Lean is implemented in Lean. Very extensible.
The math community made Lean their own: >50,000 lines of extensions.

def reverse : List α → List α
  | [] => []
  | x :: xs => reverse xs ++ [x]

theorem reverse_append (xs ys : List α)
    : reverse (xs ++ ys) = reverse ys ++ reverse xs := byα:Type u_1xs:List αys:List α⊢ reverse (xs ++ ys) = reverse ys ++ reverse xs
  induction xs with
  | nil =>nilα:Type u_1ys:List α⊢ reverse ([] ++ ys) = reverse ys ++ reverse [] simp [reverse]All goals completed! 🐙
  | cons x xs ih =>consα:Type u_1ys:List αx:αxs:List αih:reverse (xs ++ ys) = reverse ys ++ reverse xs⊢ reverse (x :: xs ++ ys) = reverse ys ++ reverse (x :: xs) simp [reverse, ih]All goals completed! 🐙

#eval[3, 2, 1] reverse [1, 2, 3]
[3, 2, 1]

Propositions as Types

def odd (n : Nat) : Prop := ∃ k, n = 2 * k + 1

The type of odd 3 is Prop, a statement that can be true or false.
We can't evaluate it: odd returns a Prop, not a Bool.
But we can prove it by providing a witness.

example : odd 3 := ⟨1, by⊢ 3 = 2 * 1 + 1 decideAll goals completed! 🐙⟩

Theorem Proving Is a Game

"You have written my favorite computer game" — Kevin Buzzard

The "game board": you see goals and hypotheses, then apply "moves" (tactics).

theorem square_of_odd_is_odd : odd n → odd (n * n) := byn:Nat⊢ odd n → odd (n * n)
  intro ⟨k₁, e₁⟩n:Natk₁:Nate₁:n = 2 * k₁ + 1⊢ odd (n * n)
  simp [e₁, odd]n:Natk₁:Nate₁:n = 2 * k₁ + 1⊢ ∃ k, (2 * k₁ + 1) * (2 * k₁ + 1) = 2 * k + 1
  exists 2 * k₁ * k₁ + 2 * k₁n:Natk₁:Nate₁:n = 2 * k₁ + 1⊢ (2 * k₁ + 1) * (2 * k₁ + 1) = 2 * (2 * k₁ * k₁ + 2 * k₁) + 1
  grindAll goals completed! 🐙

Each tactic transforms the goal until nothing remains. The kernel checks the final proof term.

The Kernel Catches Mistakes

example : odd 3 := ⟨2, by⊢ 3 = 2 * 2 + 1 decide⊢ 3 = 2 * 2 + 1Tactic `decide` proved that the proposition
  3 = 2 * 2 + 1
is false⟩

You don't need to trust me, or my automation. You only need to trust the small kernel.

An incorrect proof is rejected immediately.

Lean has multiple independent kernels.

You can build your own and submit it to arena.lean-lang.org.

Lean Is an IDE

Rich user interface and integrated tooling.
Build system (Lake), LSP server, and VS Code plugin.
Real-time feedback: errors, goals, and hints as you type.

Lean IDE with Claude Code

The IDE is where I collaborate with AI now.

Lean Is an IDE (part 2)

I asked Claude to prove append_length. It wrote the proof, Lean checked it. No goals remaining.

Claude proves append_length

Mathlib: The Mathematical Library

200,000+ theorems. 750+ contributors.
From basic algebra to modern research mathematics.
Fewer than 1,000 missing definitions from coverage of modern research math.
The dataset every AI math system trains on.

Mathlib dependency graph

The Liquid Tensor Experiment

In 2020, Peter Scholze posed a formalization challenge.

"I spent much of 2019 obsessed with the proof of this theorem, almost getting crazy over it. I still have some small lingering doubts." — Peter Scholze

Johan Commelin led a team that verified the proof, with only minor corrections.

They verified and simplified it without fully understanding the proof. Lean was a guide.

"The Lean Proof Assistant was really that: an assistant in navigating through the thick jungle that this proof is." — Peter Scholze

Nature article on LTE

Lean Is Taking Mathematics by Storm

Five Fields Medalists engaged: Tao, Scholze, Viazovska, Gowers, Hairer.

Equational Theories Project (completed) — Terence Tao
Carleson's Theorem (completed) — Floris van Doorn
Fields Medal sphere packing — AI assisted
Fermat's Last Theorem — Kevin Buzzard
Abc conjecture

Tao on Lex Fridman

Tao Finding a Bug in His Own Paper

Tao finding a bug via Lean

Lean's linarith couldn't close a step because he only had n > 2 but needed 0 < n - 3. He added a footnote to the published paper.

Even the best mathematicians benefit from machine checking.

Tao on Refactoring

"We had formalized the proof with this constant 12, and then when this new paper came out, we said, 'Okay, let's update the 12 to 11.' And what you can do with Lean is that you just in your headline theorem change a 12 to 11. You run the compiler and... of the thousands of lines of code you have, 90% of them still work." — Terence Tao, Lex Fridman interview

Formal math is maintainable. This is new.

Reading vs Writing Formal Math

It is easier to read formal math than write it.
Formal math is easier to read than LaTeX.
You can click through, inspect definitions, check types.
AI can explain formal proofs.
AI is making the writing part easier.
Humans need to be able to read formal math.

$Formal math in Lean$

What We Learned from Mathematics

Machine-checkable proofs enable a new level of collaboration.
Auto refactoring and generalization for free.
It is not just about proving but also understanding complex objects and proofs, getting new insights, and navigating through structures beyond our cognitive abilities.

Auto-generalization in Mathlib

Do We Still Need Proof Automation?

"I thought AI would prove all theorems for us now."

Tactics are like "game moves".
Better tactics = shorter proofs.
Better tactics = more powerful AI.
grind is a new "game move."

40 lines replaced by one grind call

What is grind?

New proof automation (since Lean v4.22), developed by Kim Morrison and me.

A proof-automation tactic inspired by modern SMT solvers. Think of it as a virtual whiteboard:

Discovers new equalities, inequalities, etc.
Writes facts on the board and merges equivalent terms
Multiple engines cooperate on the same workspace

Cooperating Engines:

Congruence closure; E-matching; Constraint propagation; Guided case analysis
Satellite theory solvers (linear integer arithmetic, commutative rings, linear arithmetic, AC)

Supports dependent types, type-class system, and dependent pattern matching

Produces ordinary Lean proof terms for every fact.

Tao on grind

"I have been experimenting recently with the new tactic grind in Lean, which is a powerful tool inspired by 'good old-fashioned AI' such as satisfiability modulo theories (SMT) solvers... When I apply grind to a given subgoal, it can report a success within seconds... But, importantly, when this does not work, I quickly get a grind failed message."

How Does AI Affect Proof Automation?

I built heuristics for Z3: quantifier instantiation, case-split ordering, proof search.
AI can do this better. AI replaces the heuristics.
The algorithmic core (congruence closure, linear arithmetic, decision procedures) is more valuable than ever.
grind is designed this way: strong algorithms, AI-replaceable search.

example (f : R → Nat)
    : max 3 (4 * f ((cos x + sin x)^2)) ≠ 2 + f (2 * cos x * sin x + 1) := byx:Rf:R → Nat⊢ max 3 (4 * f ((cos x + sin x) ^ 2)) ≠ 2 + f (2 * cos x * sin x + 1)
  grind =>
    use [Nat.max_def, trig_identity]grindx:Rf:R → Nath✝:max 3 (4 * f ((cos x + sin x) ^ 2)) = f (2 * cos x * sin x + 1) + 2⊢ False
    ringgrindx:Rf:R → Nath✝:max 3 (4 * f ((cos x + sin x) ^ 2)) = f (2 * cos x * sin x + 1) + 2⊢ False[ring] Rings
[ring] Ring `Lean.Grind.Ring.OfSemiring.Q Nat`
[basis] Basis
[_] ↑(f (2 * cos x * sin x + 1)) + -1 * ↑(f ((cos x + sin x) ^ 2)) = 0
[ring] Ring `Int`
[basis] Basis
[_] ↑(f (2 * cos x * sin x + 1)) + -1 * ↑(f ((cos x + sin x) ^ 2)) = 0
[ring] Ring `Rat`
[basis] Basis
[_] cos x ^ 2 + sin x ^ 2 + -1 = 0
    cases_nextAll goals completed! 🐙

Better Tactics = Better AI

When grind closes a goal in one step instead of fifty, the AI's search tree shrinks dramatically.
Compact proofs = better training data = better AI provers.
The flywheel: better automation, compact proofs, better AI, more users, more libraries, repeat.

example {α} [CommRing α] [IsCharP α 0] [NoNatZeroDivisors α]
    (d t d_inv : α)
    (Δ40 : d * (d + t + d * t) = 0)
    (Δ41 : d^2 * (d + d * t - 2 * d * t^2 + d * t^4 + d^2 * t^4) = 0)
    : d * d_inv = 1 → t + 2 * t^2 - t^3 - 2 * t^4 + t^5 = 0 := byα:Type u_1inst✝²:CommRing αinst✝¹:IsCharP α 0inst✝:NoNatZeroDivisors αd:αt:αd_inv:αΔ40:d * (d + t + d * t) = 0Δ41:d ^ 2 * (d + d * t - 2 * d * t ^ 2 + d * t ^ 4 + d ^ 2 * t ^ 4) = 0⊢ d * d_inv = 1 → t + 2 * t ^ 2 - t ^ 3 - 2 * t ^ 4 + t ^ 5 = 0
  grindAll goals completed! 🐙

Quantum Algebra example from “Categories generated by a trivalent vertex”, Morrison, Peters, and Snyder

Every AI System Used Lean

Every AI system that solved IMO problems with formal proofs used Lean.

AlphaProof (Google DeepMind) — silver medal, IMO 2024
Aristotle (Harmonic) — gold medal, IMO 2025
SEED Prover (ByteDance) — silver medal, IMO 2025

Three independent teams. No competing platform was used by any of them.

The AI Scoreboard

Axiom: valued at $1.5 billion. 12/12 on Putnam 2025.
DeepSeek Prover-V2: 88.9% on MiniF2F-test. Open-source, 671B parameters.
Harmonic: valued at $1.45 billion. Built the Aristotle AI.

None of these systems existed in early 2024.

Startups using Lean

Aristotle — Harmonic's AI for Lean

Aristotle proving Gauss-Lucas theorem in Mathlib

"The proof was found by Aristotle AI, then rewritten by me in Mathlib style."

Why Lean?

Lean is implemented in Lean. You can access Lean's internal data using Lean.
Mathlib is the training data. 200,000+ theorems. 2.2M+ lines.
Machine-checkable proofs = unfakeable evaluation metric.
Open source. No vendor lock-in.
Five Fields Medalists. 750+ contributors.

"I F*cking Love This Sh!t"

"I genuinely do not know of another tool that can do all of the following:" — Markus de Medeiros

ML-style (not insane) functional programming
Metaprogramming so easy you can do it just to make your life easier
Access to every C library ever written
Honest to god proofs for your pure code
Monadic style (not insane) imperative programming

He implemented a cryptographic protocol in Lean: OpenSSL bindings via FFI, a custom DSL via metaprogramming, and formal proofs of correctness.

All in one language. Blog post

"Vibe Proving"

So many people are choosing Lean that even casual users are proving theorems with ChatGPT. "We used ChatGPT to vibe code a Lean proof."

Erdos prize paper

"The bottleneck for using AI to create strategies and make conjectures is we have to rely on human experts and the test of time to validate whether something is plausible or not." — Terence Tao, Dwarkesh Patel interview

The Moment AI Changed Everything for Me

AI is not just changing who uses Lean. It changed how I work.

Kim Morrison (Lean FRO) showed me a major refactoring step: more than 10,000 lines changed across Mathlib, implemented by Claude Code.

Since then, AI has removed the pain of developing software.

I used to believe my superpower was that I could tolerate a lot of pain. That superpower is no longer relevant.

"Oh my gosh, how do you manage to do this?" — Jeremy Avigad, watching me analyze gigantic Lean traces to isolate a bug (2016)

AI now does this. It isolates issues, tries new ideas, runs experiments, analyzes performance, and much more.

Optimizing Lean for AI

AI will be Lean's main user. I seldom use the Lean tools directly anymore.
Humans will interact with Lean at a much higher level of abstraction.
Many improvements coming soon:
- Scalability: we are optimizing every subsystem to support 100x more formalized mathematics
- Disk-based resumption: >100x faster AI-Lean interaction
- AI-optimized protocol for proof exploration at scale
- grind diagnostics designed for AI consumption
- Python interop for the AI ecosystem

Reward Hacking in RL

"It's really important with these formal proof assistants that there are no backdoors or exploits you can use to somehow get your certified proof without actually proving it, because reinforcement learning is just so good at finding these backdoors." — Terence Tao

The kernel's soundness is AI safety infrastructure.

Who Watches the Provers? | Comparator | Validating a Lean Proof

Your Program Is Already a Specification

The kernel is trustworthy. So what can you build on that trust?

An inefficient program is a specification. I wrote this:

inductive Tree (α : Type u) where
  | leaf
  | node (left : Tree α) (a : α) (right : Tree α)

def Tree.map (f : α → β) : Tree α → Tree β
  | .leaf => .leaf
  | .node l a r => .node (map f l) (f a) (map f r)

My prompt to AI: "Implement a tail-recursive version, and prove it computes the same result."

AI's Response: Setup

def size : Tree α → Nat
  | .leaf => 0
  | .node l _ r => 1 + l.size + r.size

/-- Stack frame for tail-recursive tree map. -/
inductive Frame (α : Type u) (β : Type v) where
  | visit (val : β) (right : Tree α)
  | done (left : Tree β) (val : β)

def stackWeight : List (Frame α β) → Nat
  | [] => 0
  | .visit _ r :: s => 4 * r.size + 3 + stackWeight s
  | .done _ _ :: s => 1 + stackWeight s

stackWeight is the termination measure. It assigns a weight to each stack frame so that every transition (go-to-go, go-to-back, back-to-go, back-to-back) strictly decreases. visit frames are heavier because they still carry unprocessed subtrees.

AI's Response: Tail-Recursive Implementation

mutual
  def Tree.mapGo (f : α → β) (stack : List (Tree.Frame α β))
      (t : Tree α) : Tree β :=
    match t with
    | .node l a r => mapGo f (.visit (f a) r :: stack) l
    | .leaf => mapBack f stack .leaf
  termination_by 4 * t.size + stackWeight stack + 1
  decreasing_by all_goals simp only [stackWeight, size]α:Type u_1β:Type u_2stack:List (Frame α β)⊢ stackWeight stack < 4 * 0 + stackWeight stack + 1; omegaAll goals completed! 🐙

  def Tree.mapBack (f : α → β) (stack : List (Tree.Frame α β))
      (result : Tree β) : Tree β :=
    match stack with
    | .visit a r :: rest => mapGo f (.done result a :: rest) r
    | .done l a :: rest => mapBack f rest (.node l a result)
    | [] => result
  termination_by stackWeight stack
  decreasing_by all_goals simp only [stackWeight]α:Type u_1β:Type u_2result:Tree βl:Tree βa:βrest:List (Frame α β)⊢ stackWeight rest < 1 + stackWeight rest; omegaAll goals completed! 🐙
end

Two mutually recursive functions with an explicit stack. A termination measure. omega closes all obligations.

The Equivalence Theorem

theorem Tree.mapTR_eq_map (f : α → β) (t : Tree α) :
    Tree.mapTR f t = Tree.map f t := byα:Type u_1β:Type u_2f:α → βt:Tree α⊢ mapTR f t = map f t
  simp only [Tree.mapTR, Tree.mapGo_spec, Tree.applyStack]All goals completed! 🐙

The key idea: applyStack interprets the stack as a context. The mutual lemmas show both functions respect this interpretation. The main theorem follows in one line.

I wrote 3 lines. AI wrote 70 lines of optimized code and correctness proof. No supervision. The simple version was the specification. Full source

AI Is Rewriting Critical Software

Google: 30%+ of new code is AI-generated. Microsoft: 25-30%. Meta: targeting 50%.
45% of AI-generated code fails basic security tests (Veracode, 100+ models). Newer, larger models do not help.
"I 'Accept All' always, I don't read the diffs anymore." — Andrej Karpathy

The same AI that creates this problem can generate verified proofs. Both sides run on Lean.

zlib in Lean

AI converted zlib (a C compression library) to Lean.

The Lean version passed the test suite, and then proved that the code is correct.

theorem zlib_decompressSingle_compress (data : ByteArray) (level : UInt8)
    (hsize : data.size < 1024 * 1024 * 1024) :
    ZlibDecode.decompressSingle (ZlibEncode.compress data level) = .ok data

Decompression always recovers the original data. Every compression level. Machine-checked.

The barrier is no longer AI capability. It is platform readiness.

When AI Writes the World's Software, Who Verifies It?

Radix: 10 AI Agents, One Weekend

10 AI agents built a verified embedded DSL with proved optimizations in a single weekend.

Zero sorry — 52 theorems, all complete
5 verified compiler optimizations
Determinism, type safety, memory safety — all proved
Interpreter correctness — sound and complete
27 modules, ~7,400 lines of Lean
I did not touch the code. Zero lines. Full agent autonomy.

Radix — A Real Program

def bubbleSort := `[RStmt|
  let arr := new uint64[][10];
  arr[0] := 5; arr[1] := 3; arr[2] := 8;
  arr[3] := 1; arr[4] := 9; arr[5] := 2;
  arr[6] := 7; arr[7] := 4; arr[8] := 6;
  arr[9] := 0;
  i := 0;
  while (i < 9) {
    j := 0;
    while (j < 9 - i) {
      let a : uint64 = arr[j];
      let b : uint64 = arr[j + 1];
      if (a > b) {
        arr[j] := b;
        arr[j + 1] := a;
      }
      j := j + 1;
    }
    i := i + 1;
  }
  first := arr[0];
  last := arr[9];
]

#eval!"0" run bubbleSort "first"
#eval!"9" run bubbleSort "last"

Radix — Big-Step Semantics — 16 Rules

-- Big-step: ⟨σ, s⟩ ⇓ r
-- 16 rules: skip, assign, decl, seqNormal, seqReturn,
-- ifTrue, ifFalse, whileTrue, whileReturn, whileFalse,
-- alloc, free, arrSet, ret, block, callStmt, scope
example : BigStep σ .skip (.normal σ) :=
  BigStep.skip
example (he : e.eval σ = some v)
    (hs : σ.setVar x v = some σ') :
    BigStep σ (.assign x e) (.normal σ') :=
  BigStep.assign he hs
example (h₁ : BigStep σ₁ s₁ (.normal σ₂))
    (h₂ : BigStep σ₂ s₂ r) :
    BigStep σ₁ (s₁ ;; s₂) r :=
  BigStep.seqNormal h₁ h₂
example (hc : e.eval σ₁ = some (.bool true))
    (hb : BigStep σ₁ b (.normal σ₂))
    (hw : BigStep σ₂ (.while e b) r) :
    BigStep σ₁ (.while e b) r :=
  BigStep.whileTrue hc hb hw

Radix — Determinism — `grind` Does The Work

-- Same state + same statement → same result
-- 80-line proof by induction on h₁
-- grind closes equational contradictions
example (h₁ : BigStep σ s r₁)
    (h₂ : BigStep σ s r₂) : r₁ = r₂ :=
  BigStep.det h₁ h₂

In the real proof: cases h₂ with | assign => grind handles equational cases. grind decides contradictions like some true = some false. Explicit IH application for recursive cases (seq, while, call, scope).

Radix — Interpreter Correctness — Sound and Complete

A fuel-based interpreter (Stmt.interp) proved equivalent to the relational semantics in both directions:

-- Completeness: BigStep implies interp succeeds
example (h : BigStep σ s r) :
    ∃ fuel, s.interp fuel σ =
      (.ok r.retVal, r.state) :=
  Stmt.interp_complete h
-- Soundness: interp success implies BigStep
example (h : s.interp fuel σ = (.ok rv, σ')) :
    BigStep σ s (toStmtResult rv σ') :=
  Stmt.interp_sound h

The relational semantics is the spec. The interpreter is the implementation. These two theorems say they agree exactly.

Radix — Verified Optimizations

example (h : BigStep σ s r) :
    BigStep σ s.constFold r :=
  Stmt.constFold_correct h

example (h : BigStep σ s r) :
    BigStep σ s.deadCodeElim r :=
  Stmt.deadCodeElim_correct h

example (h : BigStep σ s r) :
    BigStep σ s.copyPropagation r :=
  Stmt.copyProp_correct h

Each pass: Stmt → Stmt with a correctness theorem. "If the original runs, the optimized runs too."

Radix — Lessons Learned

Unlike most human users, Claude reads the documentation. One stuck agent? I said "Read the grind docs." It downloaded, understood, unblocked.

Examples are powerful: agents found our DSL examples and adopted the same pattern.

Our AI agents are better Lean users than our human users.

More precise: structured error context, not "it doesn't work"
More systematic: they hit every edge case, not just the one they needed
More actionable: "tactic X fails on pattern Y because Z"

Cedar (AWS)

Authorization policy engine. Verified in Lean.
No version ships unless proofs pass.

"We've found Lean to be a great tool for verified software development." — Emina Torlak

Proved validator soundness in 18 person-days.

Cedar QED comparison

SymCrypt (Microsoft)

SymCrypt verification with Aeneas

Core cryptographic library. Verified using Aeneas + Lean.
They verify the Rust code as written by software engineers.
Code is evolving (new optimizations for specific hardware). They must adapt to rewrites.

Veil: Verified Distributed Protocols

Built on Lean by Ilya Sergey's group.
Combines model checking with full formal proof.
Found an inconsistency in a prior formal verification of Rabia consensus that went undetected across two separate tools.
Even the verifiers can be wrong. Lean caught it.

Veil code

Verifying Distributed Protocols in Veil

The Bottleneck Is Shifting

AI will write proofs. AI will write code. AI will even draft specifications from natural language.
But reading the formal specification and saying "yes, this is what I meant" remains the human job.
The formal statement is the trust anchor.
Designing the right abstractions, choosing the right interfaces: that's the human contribution.

Proof Assistants in the Age of AI

The Lean FRO

Nonprofit. 20 engineers. Open source. Controlled by no single company.

Sebastian Ullrich and I launched it in August 2023.

28 releases and 8,000+ pull requests merged since its launch.

"Lean is not a research project anymore. We run the Lean FRO as a startup."

Public roadmaps | Our team

ACM SIGPLAN Award 2025: "Lean has become the de facto choice for AI-based systems of mathematical reasoning."

The Lean project was featured in: NY Times, Quanta, Scientific American, Wired, etc.

The Ecosystem

8,000+ GitHub repos.
200,000+ VS Code installs.
Mathlib Initiative
CSLib: a Mathlib for computer science (Stanford, Amazon, Google DeepMind).
PhysLean: Digitalizing Physics in Lean 4
Companies pivoting math-proven AIs toward program verification:
- spec auto-formalization + program synthesis + verification as a service.

Verso: Written in Lean. Checked by Lean.

Future math papers will have Lean code that is checked.
Lecture notes, papers, and slides — all in one tool.
These slides are written in Verso.
lean-lang.org is written in Verso.
leodemoura.github.io is written in Verso.

Verso: source and rendered output

Reservoir

Reservoir: the registry for all Lean packages

Conclusion

I opened this talk with something unexpected: a general-purpose AI converted a production C library to Lean and proved the Lean version correct. A year ago, no one thought that was possible yet.

“It's quite possible at the high school level... that you could get involved in a math project and actually make a real contribution because of all these AI tools, Lean, and everything else.” — Terence Tao, Dwarkesh Patel interview

Formalization isn't just about proving. It's about understanding, collaborating, and building trust at a scale beyond our cognitive abilities.

The entire discipline thrives when no one has to “take it on faith.”

Thank You

Leo de Moura
lean-lang.org
leanprover.zulipchat.com

X: @leanprover
LinkedIn: Lean FRO
Mastodon: @leanprover@functional.cafe
#leanlang #leanprover

leodemoura.github.io