

A general-purpose AI, with no special training for theorem proving, converted zlib (a C compression library) to Lean, passed the test suite, and then proved that the code is correct.
Not tested. Proved. For every possible input. lean-zip
"This was not expected to be possible yet."
How? Let me show you.

A programming language and a proof assistant
Same language for code and proofs. No translation layer.
Small trusted kernel. Proofs can be exported and independently checked.
Lean is implemented in Lean. Very extensible.
The math community made Lean their own: >50,000 lines of extensions.
def reverse : List α → List α
| [] => []
| x :: xs => reverse xs ++ [x]
theorem reverse_append (xs ys : List α)
: reverse (xs ++ ys) = reverse ys ++ reverse xs := byα:Type u_1xs:List αys:List α⊢ reverse (xs ++ ys) = reverse ys ++ reverse xs
induction xs with
| nil =>α:Type u_1ys:List α⊢ reverse ([] ++ ys) = reverse ys ++ reverse [] simp [reverse]All goals completed! 🐙
| cons x xs ih =>α:Type u_1ys:List αx:αxs:List αih:reverse (xs ++ ys) = reverse ys ++ reverse xs⊢ reverse (x :: xs ++ ys) = reverse ys ++ reverse (x :: xs) simp [reverse, ih]All goals completed! 🐙
#eval[3, 2, 1] reverse [1, 2, 3]
[3, 2, 1]

def odd (n : Nat) : Prop := ∃ k, n = 2 * k + 1
The type of odd 3 is Prop, a statement that can be true or false.
We can't evaluate it: odd returns a Prop, not a Bool.
But we can prove it by providing a witness.
example : odd 3 := ⟨1, by⊢ 3 = 2 * 1 + 1 decideAll goals completed! 🐙⟩

"You have written my favorite computer game" — Kevin Buzzard
The "game board": you see goals and hypotheses, then apply "moves" (tactics).
theorem square_of_odd_is_odd : odd n → odd (n * n) := byn:Nat⊢ odd n → odd (n * n)
intro ⟨k₁, e₁⟩n:Natk₁:Nate₁:n = 2 * k₁ + 1⊢ odd (n * n)
simp [e₁, odd]n:Natk₁:Nate₁:n = 2 * k₁ + 1⊢ ∃ k, (2 * k₁ + 1) * (2 * k₁ + 1) = 2 * k + 1
exists 2 * k₁ * k₁ + 2 * k₁n:Natk₁:Nate₁:n = 2 * k₁ + 1⊢ (2 * k₁ + 1) * (2 * k₁ + 1) = 2 * (2 * k₁ * k₁ + 2 * k₁) + 1
grindAll goals completed! 🐙
Each tactic transforms the goal until nothing remains. The kernel checks the final proof term.

example : odd 3 := ⟨2, by⊢ 3 = 2 * 2 + 1 decide⊢ 3 = 2 * 2 + 1Tactic `decide` proved that the proposition
3 = 2 * 2 + 1
is false⟩
You don't need to trust me, or my automation. You only need to trust the small kernel.
An incorrect proof is rejected immediately.
Lean has multiple independent kernels.
You can build your own and submit it to arena.lean-lang.org.

Rich user interface and integrated tooling.
Build system (Lake), LSP server, and VS Code plugin.
Real-time feedback: errors, goals, and hints as you type.

The IDE is where I collaborate with AI now.

I asked Claude to prove append_length. It wrote the proof, Lean checked it. No goals remaining.


200,000+ theorems. 750+ contributors.
From basic algebra to modern research mathematics.
Fewer than 1,000 missing definitions from coverage of modern research math.
The dataset every AI math system trains on.


In 2020, Peter Scholze posed a formalization challenge.
"I spent much of 2019 obsessed with the proof of this theorem, almost getting crazy over it. I still have some small lingering doubts." — Peter Scholze
Johan Commelin led a team that verified the proof, with only minor corrections.
They verified and simplified it without fully understanding the proof. Lean was a guide.
"The Lean Proof Assistant was really that: an assistant in navigating through the thick jungle that this proof is." — Peter Scholze


Five Fields Medalists engaged: Tao, Scholze, Viazovska, Gowers, Hairer.
Equational Theories Project (completed) — Terence Tao
Carleson's Theorem (completed) — Floris van Doorn
Fields Medal sphere packing — AI assisted
Fermat's Last Theorem — Kevin Buzzard
Abc conjecture



Lean's linarith couldn't close a step because he only had n > 2 but needed 0 < n - 3. He added a footnote to the published paper.
Even the best mathematicians benefit from machine checking.

"We had formalized the proof with this constant 12, and then when this new paper came out, we said, 'Okay, let's update the 12 to 11.' And what you can do with Lean is that you just in your headline theorem change a 12 to 11. You run the compiler and... of the thousands of lines of code you have, 90% of them still work." — Terence Tao, Lex Fridman interview
Formal math is maintainable. This is new.

It is easier to read formal math than write it.
Formal math is easier to read than LaTeX.
You can click through, inspect definitions, check types.
AI can explain formal proofs.
AI is making the writing part easier.
Humans need to be able to read formal math.


Machine-checkable proofs enable a new level of collaboration.
Auto refactoring and generalization for free.
It is not just about proving but also understanding complex objects and proofs, getting new insights, and navigating through structures beyond our cognitive abilities.


"I thought AI would prove all theorems for us now."
Tactics are like "game moves".
Better tactics = shorter proofs.
Better tactics = more powerful AI.
grind is a new "game move."


New proof automation (since Lean v4.22), developed by Kim Morrison and me.
A proof-automation tactic inspired by modern SMT solvers. Think of it as a virtual whiteboard:
Discovers new equalities, inequalities, etc.
Writes facts on the board and merges equivalent terms
Multiple engines cooperate on the same workspace
Cooperating Engines:
Congruence closure; E-matching; Constraint propagation; Guided case analysis
Satellite theory solvers (linear integer arithmetic, commutative rings, linear arithmetic, AC)
Supports dependent types, type-class system, and dependent pattern matching
Produces ordinary Lean proof terms for every fact.


"I have been experimenting recently with the new tactic grind in Lean, which is a powerful tool inspired by 'good old-fashioned AI' such as satisfiability modulo theories (SMT) solvers... When I apply grind to a given subgoal, it can report a success within seconds... But, importantly, when this does not work, I quickly get a grind failed message."

I built heuristics for Z3: quantifier instantiation, case-split ordering, proof search.
AI can do this better. AI replaces the heuristics.
The algorithmic core (congruence closure, linear arithmetic, decision procedures) is more valuable than ever.
grind is designed this way: strong algorithms, AI-replaceable search.
example (f : R → Nat)
: max 3 (4 * f ((cos x + sin x)^2)) ≠ 2 + f (2 * cos x * sin x + 1) := byx:Rf:R → Nat⊢ max 3 (4 * f ((cos x + sin x) ^ 2)) ≠ 2 + f (2 * cos x * sin x + 1)
grind =>
use [Nat.max_def, trig_identity]x:Rf:R → Nath✝:max 3 (4 * f ((cos x + sin x) ^ 2)) = f (2 * cos x * sin x + 1) + 2⊢ False
ringx:Rf:R → Nath✝:max 3 (4 * f ((cos x + sin x) ^ 2)) = f (2 * cos x * sin x + 1) + 2⊢ False[ring] Rings
[ring] Ring `Lean.Grind.Ring.OfSemiring.Q Nat`
[basis] Basis
- [_] ↑(f (2 * cos x * sin x + 1)) + -1 * ↑(f ((cos x + sin x) ^ 2)) = 0
[ring] Ring `Int`
[basis] Basis
- [_] ↑(f (2 * cos x * sin x + 1)) + -1 * ↑(f ((cos x + sin x) ^ 2)) = 0
[ring] Ring `Rat`
[basis] Basis
- [_] cos x ^ 2 + sin x ^ 2 + -1 = 0
cases_nextAll goals completed! 🐙

When grind closes a goal in one step instead of fifty, the AI's search tree shrinks dramatically.
Compact proofs = better training data = better AI provers.
The flywheel: better automation, compact proofs, better AI, more users, more libraries, repeat.
example {α} [CommRing α] [IsCharP α 0] [NoNatZeroDivisors α]
(d t d_inv : α)
(Δ40 : d * (d + t + d * t) = 0)
(Δ41 : d^2 * (d + d * t - 2 * d * t^2 + d * t^4 + d^2 * t^4) = 0)
: d * d_inv = 1 → t + 2 * t^2 - t^3 - 2 * t^4 + t^5 = 0 := byα:Type u_1inst✝²:CommRing αinst✝¹:IsCharP α 0inst✝:NoNatZeroDivisors αd:αt:αd_inv:αΔ40:d * (d + t + d * t) = 0Δ41:d ^ 2 * (d + d * t - 2 * d * t ^ 2 + d * t ^ 4 + d ^ 2 * t ^ 4) = 0⊢ d * d_inv = 1 → t + 2 * t ^ 2 - t ^ 3 - 2 * t ^ 4 + t ^ 5 = 0
grindAll goals completed! 🐙
Quantum Algebra example from “Categories generated by a trivalent vertex”, Morrison, Peters, and Snyder

Every AI system that solved IMO problems with formal proofs used Lean.
AlphaProof (Google DeepMind) — silver medal, IMO 2024
Aristotle (Harmonic) — gold medal, IMO 2025
SEED Prover (ByteDance) — silver medal, IMO 2025
Three independent teams. No competing platform was used by any of them.

Axiom: valued at $1.5 billion. 12/12 on Putnam 2025.
DeepSeek Prover-V2: 88.9% on MiniF2F-test. Open-source, 671B parameters.
Harmonic: valued at $1.45 billion. Built the Aristotle AI.
None of these systems existed in early 2024.



"The proof was found by Aristotle AI, then rewritten by me in Mathlib style."

Lean is implemented in Lean. You can access Lean's internal data using Lean.
Mathlib is the training data. 200,000+ theorems. 2.2M+ lines.
Machine-checkable proofs = unfakeable evaluation metric.
Open source. No vendor lock-in.
Five Fields Medalists. 750+ contributors.

"I genuinely do not know of another tool that can do all of the following:" — Markus de Medeiros
ML-style (not insane) functional programming
Metaprogramming so easy you can do it just to make your life easier
Access to every C library ever written
Honest to god proofs for your pure code
Monadic style (not insane) imperative programming
He implemented a cryptographic protocol in Lean: OpenSSL bindings via FFI, a custom DSL via metaprogramming, and formal proofs of correctness.
All in one language. Blog post

So many people are choosing Lean that even casual users are proving theorems with ChatGPT. "We used ChatGPT to vibe code a Lean proof."

"The bottleneck for using AI to create strategies and make conjectures is we have to rely on human experts and the test of time to validate whether something is plausible or not." — Terence Tao, Dwarkesh Patel interview

AI is not just changing who uses Lean. It changed how I work.
Kim Morrison (Lean FRO) showed me a major refactoring step: more than 10,000 lines changed across Mathlib, implemented by Claude Code.
Since then, AI has removed the pain of developing software.
I used to believe my superpower was that I could tolerate a lot of pain. That superpower is no longer relevant.
"Oh my gosh, how do you manage to do this?" — Jeremy Avigad, watching me analyze gigantic Lean traces to isolate a bug (2016)
AI now does this. It isolates issues, tries new ideas, runs experiments, analyzes performance, and much more.

AI will be Lean's main user. I seldom use the Lean tools directly anymore.
Humans will interact with Lean at a much higher level of abstraction.
Many improvements coming soon:
Scalability: we are optimizing every subsystem to support 100x more formalized mathematics
Disk-based resumption: >100x faster AI-Lean interaction
AI-optimized protocol for proof exploration at scale
grind diagnostics designed for AI consumption
Python interop for the AI ecosystem

"It's really important with these formal proof assistants that there are no backdoors or exploits you can use to somehow get your certified proof without actually proving it, because reinforcement learning is just so good at finding these backdoors." — Terence Tao
The kernel's soundness is AI safety infrastructure.
Who Watches the Provers? | Comparator | Validating a Lean Proof

The kernel is trustworthy. So what can you build on that trust?
An inefficient program is a specification. I wrote this:
inductive Tree (α : Type u) where
| leaf
| node (left : Tree α) (a : α) (right : Tree α)
def Tree.map (f : α → β) : Tree α → Tree β
| .leaf => .leaf
| .node l a r => .node (map f l) (f a) (map f r)
My prompt to AI: "Implement a tail-recursive version, and prove it computes the same result."

def size : Tree α → Nat
| .leaf => 0
| .node l _ r => 1 + l.size + r.size
/-- Stack frame for tail-recursive tree map. -/
inductive Frame (α : Type u) (β : Type v) where
| visit (val : β) (right : Tree α)
| done (left : Tree β) (val : β)
def stackWeight : List (Frame α β) → Nat
| [] => 0
| .visit _ r :: s => 4 * r.size + 3 + stackWeight s
| .done _ _ :: s => 1 + stackWeight s
stackWeight is the termination measure. It assigns a weight to each stack frame so that every transition (go-to-go, go-to-back, back-to-go, back-to-back) strictly decreases. visit frames are heavier because they still carry unprocessed subtrees.

mutual
def Tree.mapGo (f : α → β) (stack : List (Tree.Frame α β))
(t : Tree α) : Tree β :=
match t with
| .node l a r => mapGo f (.visit (f a) r :: stack) l
| .leaf => mapBack f stack .leaf
termination_by 4 * t.size + stackWeight stack + 1
decreasing_by all_goals simp only [stackWeight, size]α:Type u_1β:Type u_2stack:List (Frame α β)⊢ stackWeight stack < 4 * 0 + stackWeight stack + 1; omegaAll goals completed! 🐙
def Tree.mapBack (f : α → β) (stack : List (Tree.Frame α β))
(result : Tree β) : Tree β :=
match stack with
| .visit a r :: rest => mapGo f (.done result a :: rest) r
| .done l a :: rest => mapBack f rest (.node l a result)
| [] => result
termination_by stackWeight stack
decreasing_by all_goals simp only [stackWeight]α:Type u_1β:Type u_2result:Tree βl:Tree βa:βrest:List (Frame α β)⊢ stackWeight rest < 1 + stackWeight rest; omegaAll goals completed! 🐙
end
Two mutually recursive functions with an explicit stack. A termination measure. omega closes all obligations.

theorem Tree.mapTR_eq_map (f : α → β) (t : Tree α) :
Tree.mapTR f t = Tree.map f t := byα:Type u_1β:Type u_2f:α → βt:Tree α⊢ mapTR f t = map f t
simp only [Tree.mapTR, Tree.mapGo_spec, Tree.applyStack]All goals completed! 🐙
The key idea: applyStack interprets the stack as a context. The mutual lemmas show both functions respect this interpretation. The main theorem follows in one line.
I wrote 3 lines. AI wrote 70 lines of optimized code and correctness proof. No supervision. The simple version was the specification. Full source

Google: 30%+ of new code is AI-generated. Microsoft: 25-30%. Meta: targeting 50%.
45% of AI-generated code fails basic security tests (Veracode, 100+ models). Newer, larger models do not help.
"I 'Accept All' always, I don't read the diffs anymore." — Andrej Karpathy
The same AI that creates this problem can generate verified proofs. Both sides run on Lean.

AI converted zlib (a C compression library) to Lean.
The Lean version passed the test suite, and then proved that the code is correct.
theorem zlib_decompressSingle_compress (data : ByteArray) (level : UInt8)
(hsize : data.size < 1024 * 1024 * 1024) :
ZlibDecode.decompressSingle (ZlibEncode.compress data level) = .ok data
Decompression always recovers the original data. Every compression level. Machine-checked.
The barrier is no longer AI capability. It is platform readiness.

10 AI agents built a verified embedded DSL with proved optimizations in a single weekend.
Zero sorry — 52 theorems, all complete
5 verified compiler optimizations
Determinism, type safety, memory safety — all proved
Interpreter correctness — sound and complete
27 modules, ~7,400 lines of Lean
I did not touch the code. Zero lines. Full agent autonomy.

def bubbleSort := `[RStmt|
let arr := new uint64[][10];
arr[0] := 5; arr[1] := 3; arr[2] := 8;
arr[3] := 1; arr[4] := 9; arr[5] := 2;
arr[6] := 7; arr[7] := 4; arr[8] := 6;
arr[9] := 0;
i := 0;
while (i < 9) {
j := 0;
while (j < 9 - i) {
let a : uint64 = arr[j];
let b : uint64 = arr[j + 1];
if (a > b) {
arr[j] := b;
arr[j + 1] := a;
}
j := j + 1;
}
i := i + 1;
}
first := arr[0];
last := arr[9];
]
#eval!"0" run bubbleSort "first"
#eval!"9" run bubbleSort "last"

-- Big-step: ⟨σ, s⟩ ⇓ r
-- 16 rules: skip, assign, decl, seqNormal, seqReturn,
-- ifTrue, ifFalse, whileTrue, whileReturn, whileFalse,
-- alloc, free, arrSet, ret, block, callStmt, scope
example : BigStep σ .skip (.normal σ) :=
BigStep.skip
example (he : e.eval σ = some v)
(hs : σ.setVar x v = some σ') :
BigStep σ (.assign x e) (.normal σ') :=
BigStep.assign he hs
example (h₁ : BigStep σ₁ s₁ (.normal σ₂))
(h₂ : BigStep σ₂ s₂ r) :
BigStep σ₁ (s₁ ;; s₂) r :=
BigStep.seqNormal h₁ h₂
example (hc : e.eval σ₁ = some (.bool true))
(hb : BigStep σ₁ b (.normal σ₂))
(hw : BigStep σ₂ (.while e b) r) :
BigStep σ₁ (.while e b) r :=
BigStep.whileTrue hc hb hw

grind Does The Work-- Same state + same statement → same result
-- 80-line proof by induction on h₁
-- grind closes equational contradictions
example (h₁ : BigStep σ s r₁)
(h₂ : BigStep σ s r₂) : r₁ = r₂ :=
BigStep.det h₁ h₂
In the real proof: cases h₂ with | assign => grind handles equational cases. grind decides contradictions like some true = some false. Explicit IH application for recursive cases (seq, while, call, scope).

A fuel-based interpreter (Stmt.interp) proved equivalent to the relational semantics in both directions:
-- Completeness: BigStep implies interp succeeds
example (h : BigStep σ s r) :
∃ fuel, s.interp fuel σ =
(.ok r.retVal, r.state) :=
Stmt.interp_complete h
-- Soundness: interp success implies BigStep
example (h : s.interp fuel σ = (.ok rv, σ')) :
BigStep σ s (toStmtResult rv σ') :=
Stmt.interp_sound h
The relational semantics is the spec. The interpreter is the implementation. These two theorems say they agree exactly.

example (h : BigStep σ s r) :
BigStep σ s.constFold r :=
Stmt.constFold_correct h
example (h : BigStep σ s r) :
BigStep σ s.deadCodeElim r :=
Stmt.deadCodeElim_correct h
example (h : BigStep σ s r) :
BigStep σ s.copyPropagation r :=
Stmt.copyProp_correct h
Each pass: Stmt → Stmt with a correctness theorem. "If the original runs, the optimized runs too."

Unlike most human users, Claude reads the documentation. One stuck agent? I said "Read the grind docs." It downloaded, understood, unblocked.
Examples are powerful: agents found our DSL examples and adopted the same pattern.
Our AI agents are better Lean users than our human users.
More precise: structured error context, not "it doesn't work"
More systematic: they hit every edge case, not just the one they needed
More actionable: "tactic X fails on pattern Y because Z"

Authorization policy engine. Verified in Lean.
No version ships unless proofs pass.
"We've found Lean to be a great tool for verified software development." — Emina Torlak
Proved validator soundness in 18 person-days.



Core cryptographic library. Verified using Aeneas + Lean.
They verify the Rust code as written by software engineers.
Code is evolving (new optimizations for specific hardware). They must adapt to rewrites.

Combines model checking with full formal proof.
Found an inconsistency in a prior formal verification of Rabia consensus that went undetected across two separate tools.
Even the verifiers can be wrong. Lean caught it.


AI will write proofs. AI will write code. AI will even draft specifications from natural language.
But reading the formal specification and saying "yes, this is what I meant" remains the human job.
The formal statement is the trust anchor.
Designing the right abstractions, choosing the right interfaces: that's the human contribution.

Nonprofit. 20 engineers. Open source. Controlled by no single company.
Sebastian Ullrich and I launched it in August 2023.
28 releases and 8,000+ pull requests merged since its launch.
"Lean is not a research project anymore. We run the Lean FRO as a startup."
ACM SIGPLAN Award 2025: "Lean has become the de facto choice for AI-based systems of mathematical reasoning."
The Lean project was featured in: NY Times, Quanta, Scientific American, Wired, etc.

8,000+ GitHub repos.
200,000+ VS Code installs.
CSLib: a Mathlib for computer science (Stanford, Amazon, Google DeepMind).
PhysLean: Digitalizing Physics in Lean 4
Companies pivoting math-proven AIs toward program verification:
spec auto-formalization + program synthesis + verification as a service.

Future math papers will have Lean code that is checked.
Lecture notes, papers, and slides — all in one tool.
These slides are written in Verso.
lean-lang.org is written in Verso.
leodemoura.github.io is written in Verso.




I opened this talk with something unexpected: a general-purpose AI converted a production C library to Lean and proved the Lean version correct. A year ago, no one thought that was possible yet.
“It's quite possible at the high school level... that you could get involved in a math project and actually make a real contribution because of all these AI tools, Lean, and everything else.” — Terence Tao, Dwarkesh Patel interview
Formalization isn't just about proving. It's about understanding, collaborating, and building trust at a scale beyond our cognitive abilities.
The entire discipline thrives when no one has to “take it on faith.”
