Leonardo de Moura
Lean FRO | March 2026

10 AI agents built a verified embedded DSL with proved optimizations in a single weekend.
Zero sorry — 52 theorems, all complete
5 verified compiler optimizations
Determinism, type safety, memory safety — all proved
Linear ownership typing with soundness proof
Interpreter correctness — sound and complete
27 modules, ~7,400 lines of Lean

I did not touch the code. Zero lines. Full agent autonomy.
Unlike most human users, Claude reads the documentation. One stuck agent? I said "Read the grind docs." It downloaded, understood, unblocked.
Examples are powerful: agents found our DSL declaration examples and adopted the same pattern for Radix syntax.
Claude struggled with mutual induction — but found workarounds every time.
Claude struggled with monadic code and avoided it. We don't have enough examples for that.
I created a GitHub token scoped to the Radix repo only. Good thing — Claude tried to push to leanprover/lean4.
Toolchain version issues (v4.29.0) confused Claude when generating these slides.

Agents write Radix using concrete syntax macros. Verso elaborates every line:
def slideBubbleSort := `[RStmt|
let arr := new uint64[][10];
arr[0] := 5; arr[1] := 3; arr[2] := 8;
arr[3] := 1; arr[4] := 9; arr[5] := 2;
arr[6] := 7; arr[7] := 4; arr[8] := 6;
arr[9] := 0;
i := 0;
while (i < 9) {
j := 0;
while (j < 9 - i) {
let a : uint64 = arr[j];
let b : uint64 = arr[j + 1];
if (a > b) {
arr[j] := b;
arr[j + 1] := a;
}
j := j + 1;
}
i := i + 1;
}
]

-- Big-step: ⟨σ, s⟩ ⇓ r
-- 16 rules: skip, assign, decl, seqNormal, seqReturn,
-- ifTrue, ifFalse, whileTrue, whileReturn, whileFalse,
-- alloc, free, arrSet, ret, block, callStmt, scope
example : BigStep σ .skip (.normal σ) :=
BigStep.skip
example (he : e.eval σ = some v)
(hs : σ.setVar x v = some σ') :
BigStep σ (.assign x e) (.normal σ') :=
BigStep.assign he hs
example (h₁ : BigStep σ₁ s₁ (.normal σ₂))
(h₂ : BigStep σ₂ s₂ r) :
BigStep σ₁ (s₁ ;; s₂) r :=
BigStep.seqNormal h₁ h₂
example (hc : e.eval σ₁ = some (.bool true))
(hb : BigStep σ₁ b (.normal σ₂))
(hw : BigStep σ₂ (.while e b) r) :
BigStep σ₁ (.while e b) r :=
BigStep.whileTrue hc hb hw

grind Does The Work-- Same state + same statement → same result
-- 80-line proof by induction on h₁
-- grind closes equational contradictions
example (h₁ : BigStep σ s r₁)
(h₂ : BigStep σ s r₂) : r₁ = r₂ :=
BigStep.det h₁ h₂
In the real proof: cases h₂ with | assign => grind handles equational cases. grind decides contradictions like some true = some false. Explicit IH application for recursive cases (seq, while, call, scope).

A fuel-based interpreter (Stmt.interp) proved equivalent to the relational semantics in both directions:
-- Completeness: BigStep implies interp succeeds
example (h : BigStep σ s r) :
∃ fuel, s.interp fuel σ =
(.ok r.retVal, r.state) :=
Stmt.interp_complete h
-- Soundness: interp success implies BigStep
example (h : s.interp fuel σ = (.ok rv, σ')) :
BigStep σ s (toStmtResult rv σ') :=
Stmt.interp_sound h
The relational semantics is the spec. The interpreter is the implementation. These two theorems say they agree exactly.

-- Each pass: Stmt → Stmt with correctness theorem
-- "If the original runs, the optimized runs too"
example (h : BigStep σ s r) :
BigStep σ s.constFold r :=
Stmt.constFold_correct h
example (h : BigStep σ s r) :
BigStep σ s.deadCodeElim r :=
Stmt.deadCodeElim_correct h
example (h : BigStep σ s r) :
BigStep σ s.copyPropagation r :=
Stmt.copyProp_correct h
example (h : BigStep σ s r) :
BigStep σ s.constPropagation r :=
Stmt.constPropagation_correct h
example (h : BigStep σ s r) (hf : σ.funs = funs) :
∀ depth, BigStep σ (s.inline funs depth) r :=
Stmt.inline_correct h hf

Constant folding: Expr.inferTag conservatively infers output types to guard identity rules. e + 0 → e is sound when e produces uint64. e * 0 → 0 is NOT sound — if e fails (e.g., out-of-bounds), the original returns none but the rewrite returns some 0.
Copy propagation: CopyMap.agrees invariant — every mapping x → y means σ.getVar x = σ.getVar y. Reset at control flow joins to stay sound.
Inline: rewrites callStmt into scope (frame-isolated body), bounded depth, non-recursive only. Proves function table invariant preserved through execution.

-- LinearOk O s O' : owned-set O → stmt s → owned-set O'
-- 13 rules. Key ones:
example : LinearOk O .skip O :=
LinearOk.skip
example (h : x ∉ O) :
LinearOk O (.alloc x ty sz) (O.insert x) :=
LinearOk.alloc h
example (h : x ∈ O) :
LinearOk O (.free (.var x)) (O.erase x) :=
LinearOk.free h
example (h₁ : LinearOk O s₁ O')
(h₂ : LinearOk O' s₂ O'') :
LinearOk O (s₁ ;; s₂) O'' :=
LinearOk.seq h₁ h₂

The soundness proof (644 lines) maintains OwnershipInv — a three-part invariant:
(1) Heap well-formed: all stored addresses < nextAddr
(2) Liveness: every owned variable holds a live heap address
(3) Distinctness: different owned variables hold different addresses
-- Soundness: OwnershipInv preserved through execution
example (hlin : LinearOk O s O')
(hstep : BigStep σ s (.normal σ'))
(hinv : OwnershipInv σ O)
(hwt : WellTypedFuns σ.funs) :
OwnershipInv σ' O' :=
LinearOk.soundness hlin hstep hinv hwt
Proved by induction on BigStep — every rule preserves all three parts. The alloc case uses Heap.alloc_fresh for distinctness. The free case uses Heap.free_preserves_ne.

Not 10 interchangeable "coding agents." 10 domain experts:
Chris Lattner — AST, IR, module structure
Simon Peyton Jones — types, syntax macros
Xavier Leroy — verification strategy
Adam Chlipala — proof engineering
Emina Torlak — type checking, tests
Derek Dreyer — semantics, memory model
John Regehr — edge cases, coverage
Dan Grossman — memory safety proofs
Nadia Polikarpova — formal specifications
Tiark Rompf — interpreter, optimizations
All Claude. One weekend.

The agents gave us better feedback on Lean than our human users.
More precise: structured error context, not "it doesn't work"
More systematic: they hit every edge case, not just the one they needed
More actionable: "tactic X fails on pattern Y because Z"
Adam's grind report was better than most human post-mortems.
Nobody has built this yet.

Better diagnostics: when a tactic fails, structured error metadata — not just prose. Claude happily processes them.
Links to documentation: tactic docstrings and error messages should contain links to the actual documentation. Claude reads them.
Faster startup: Claude can try different approaches more efficiently.
More examples: and instructions on how to find them. Claude started the Radix project by reading all our examples in the core repo.
Attribute guides: teach Claude how to use @[simp], @[grind], etc.
Implicit information: a pretty printer that shows implicit arguments and avoids exponential blowup. Claude claims it will help — confirming with Anthropic.
Next experiment: Two agent teams in parallel. One builds a hard project and generates friction reports. The other patches Lean to eliminate the friction. We architect and steer.
People will not write Lean. AI will write it. Humans read and audit.
People will not write Verso. AI will produce lectures, papers, slides — with machine-checked math and code. Humans focus on the content.
Humans move up the stack. They become designers, architects, readers, auditors.