Lean for AI, AI for Lean

Leonardo de Moura

Lean FRO | March 2026

Lean

The Result

10 AI agents built a verified embedded DSL with proved optimizations in a single weekend.

  • Zero sorry — 52 theorems, all complete

  • 5 verified compiler optimizations

  • Determinism, type safety, memory safety — all proved

  • Linear ownership typing with soundness proof

  • Interpreter correctness — sound and complete

  • 27 modules, ~7,400 lines of Lean

Lean

Lessons Learned

  • I did not touch the code. Zero lines. Full agent autonomy.

  • Unlike most human users, Claude reads the documentation. One stuck agent? I said "Read the grind docs." It downloaded, understood, unblocked.

  • Examples are powerful: agents found our DSL declaration examples and adopted the same pattern for Radix syntax.

  • Claude struggled with mutual induction — but found workarounds every time.

  • Claude struggled with monadic code and avoided it. We don't have enough examples for that.

  • I created a GitHub token scoped to the Radix repo only. Good thing — Claude tried to push to leanprover/lean4.

  • Toolchain version issues (v4.29.0) confused Claude when generating these slides.

Lean

Radix — A Real Program

Agents write Radix using concrete syntax macros. Verso elaborates every line:

def slideBubbleSort := `[RStmt| let arr := new uint64[][10]; arr[0] := 5; arr[1] := 3; arr[2] := 8; arr[3] := 1; arr[4] := 9; arr[5] := 2; arr[6] := 7; arr[7] := 4; arr[8] := 6; arr[9] := 0; i := 0; while (i < 9) { j := 0; while (j < 9 - i) { let a : uint64 = arr[j]; let b : uint64 = arr[j + 1]; if (a > b) { arr[j] := b; arr[j + 1] := a; } j := j + 1; } i := i + 1; } ]
Lean

Big-Step Semantics — 16 Rules

-- Big-step: ⟨σ, s⟩ ⇓ r -- 16 rules: skip, assign, decl, seqNormal, seqReturn, -- ifTrue, ifFalse, whileTrue, whileReturn, whileFalse, -- alloc, free, arrSet, ret, block, callStmt, scope example : BigStep σ .skip (.normal σ) := BigStep.skip example (he : e.eval σ = some v) (hs : σ.setVar x v = some σ') : BigStep σ (.assign x e) (.normal σ') := BigStep.assign he hs example (h₁ : BigStep σ₁ s₁ (.normal σ₂)) (h₂ : BigStep σ₂ s₂ r) : BigStep σ₁ (s₁ ;; s₂) r := BigStep.seqNormal h₁ h₂ example (hc : e.eval σ₁ = some (.bool true)) (hb : BigStep σ₁ b (.normal σ₂)) (hw : BigStep σ₂ (.while e b) r) : BigStep σ₁ (.while e b) r := BigStep.whileTrue hc hb hw
Lean

Determinism — grind Does The Work

-- Same state + same statement → same result -- 80-line proof by induction on h₁ -- grind closes equational contradictions example (h₁ : BigStep σ s r₁) (h₂ : BigStep σ s r₂) : r₁ = r₂ := BigStep.det h₁ h₂

In the real proof: cases h₂ with | assign => grind handles equational cases. grind decides contradictions like some true = some false. Explicit IH application for recursive cases (seq, while, call, scope).

Lean

Interpreter Correctness — Sound and Complete

A fuel-based interpreter (Stmt.interp) proved equivalent to the relational semantics in both directions:

-- Completeness: BigStep implies interp succeeds example (h : BigStep σ s r) : fuel, s.interp fuel σ = (.ok r.retVal, r.state) := Stmt.interp_complete h -- Soundness: interp success implies BigStep example (h : s.interp fuel σ = (.ok rv, σ')) : BigStep σ s (toStmtResult rv σ') := Stmt.interp_sound h

The relational semantics is the spec. The interpreter is the implementation. These two theorems say they agree exactly.

Lean

Verified Optimizations — 5 Passes, 0 Sorry

-- Each pass: Stmt → Stmt with correctness theorem -- "If the original runs, the optimized runs too" example (h : BigStep σ s r) : BigStep σ s.constFold r := Stmt.constFold_correct h example (h : BigStep σ s r) : BigStep σ s.deadCodeElim r := Stmt.deadCodeElim_correct h example (h : BigStep σ s r) : BigStep σ s.copyPropagation r := Stmt.copyProp_correct h example (h : BigStep σ s r) : BigStep σ s.constPropagation r := Stmt.constPropagation_correct h example (h : BigStep σ s r) (hf : σ.funs = funs) : depth, BigStep σ (s.inline funs depth) r := Stmt.inline_correct h hf
Lean

Optimizations — What The Proofs Actually Look Like

Constant folding: Expr.inferTag conservatively infers output types to guard identity rules. e + 0 → e is sound when e produces uint64. e * 0 → 0 is NOT sound — if e fails (e.g., out-of-bounds), the original returns none but the rewrite returns some 0.

Copy propagation: CopyMap.agrees invariant — every mapping x → y means σ.getVar x = σ.getVar y. Reset at control flow joins to stay sound.

Inline: rewrites callStmt into scope (frame-isolated body), bounded depth, non-recursive only. Proves function table invariant preserved through execution.

Lean

Linear Ownership — Soundness

-- LinearOk O s O' : owned-set O → stmt s → owned-set O' -- 13 rules. Key ones: example : LinearOk O .skip O := LinearOk.skip example (h : x O) : LinearOk O (.alloc x ty sz) (O.insert x) := LinearOk.alloc h example (h : x O) : LinearOk O (.free (.var x)) (O.erase x) := LinearOk.free h example (h₁ : LinearOk O s₁ O') (h₂ : LinearOk O' s₂ O'') : LinearOk O (s₁ ;; s₂) O'' := LinearOk.seq h₁ h₂
Lean

Linear Ownership — The Real Invariant

The soundness proof (644 lines) maintains OwnershipInv — a three-part invariant:

  • (1) Heap well-formed: all stored addresses < nextAddr

  • (2) Liveness: every owned variable holds a live heap address

  • (3) Distinctness: different owned variables hold different addresses

-- Soundness: OwnershipInv preserved through execution example (hlin : LinearOk O s O') (hstep : BigStep σ s (.normal σ')) (hinv : OwnershipInv σ O) (hwt : WellTypedFuns σ.funs) : OwnershipInv σ' O' := LinearOk.soundness hlin hstep hinv hwt

Proved by induction on BigStep — every rule preserves all three parts. The alloc case uses Heap.alloc_fresh for distinctness. The free case uses Heap.free_preserves_ne.

Lean

The 10 Agents

Not 10 interchangeable "coding agents." 10 domain experts:

  • Chris Lattner — AST, IR, module structure

  • Simon Peyton Jones — types, syntax macros

  • Xavier Leroy — verification strategy

  • Adam Chlipala — proof engineering

  • Emina Torlak — type checking, tests

  • Derek Dreyer — semantics, memory model

  • John Regehr — edge cases, coverage

  • Dan Grossman — memory safety proofs

  • Nadia Polikarpova — formal specifications

  • Tiark Rompf — interpreter, optimizations

All Claude. One weekend.

Lean

The Surprise

The agents gave us better feedback on Lean than our human users.

  • More precise: structured error context, not "it doesn't work"

  • More systematic: they hit every edge case, not just the one they needed

  • More actionable: "tactic X fails on pattern Y because Z"

Adam's grind report was better than most human post-mortems.

Optimize Lean for AI too, not just humans.

Nobody has built this yet.

Lean

The Spec and The Feedback Loop

  • Better diagnostics: when a tactic fails, structured error metadata — not just prose. Claude happily processes them.

  • Links to documentation: tactic docstrings and error messages should contain links to the actual documentation. Claude reads them.

  • Faster startup: Claude can try different approaches more efficiently.

  • More examples: and instructions on how to find them. Claude started the Radix project by reading all our examples in the core repo.

  • Attribute guides: teach Claude how to use @[simp], @[grind], etc.

  • Implicit information: a pretty printer that shows implicit arguments and avoids exponential blowup. Claude claims it will help — confirming with Anthropic.

Next experiment: Two agent teams in parallel. One builds a hard project and generates friction reports. The other patches Lean to eliminate the friction. We architect and steer.

A Probable Future

  • People will not write Lean. AI will write it. Humans read and audit.

  • People will not write Verso. AI will produce lectures, papers, slides — with machine-checked math and code. Humans focus on the content.

Humans move up the stack. They become designers, architects, readers, auditors.