Finding Miscompiles for Fun, Not Profit

18 hours ago

The author, a compiler expert, used AI agents to find bugs in compilers, spending over $10,000 in one afternoon and discovering hundreds of plausible bugs in LLVM, including serious miscompiles.
Initially, a fuzzer was created with Codex for LLVM's instcombine pass, finding five bugs, but progress slowed. Later, fuzzing NVIDIA's ptxas yielded 40 miscompiles in three days, much faster than expected.
AI advancements (ChatGPT 5.2 to 5.5) made fuzzer development easier, automating tasks like bug avoidance and test case minimization, requiring minimal manual effort and leading to rapid bug discovery.
Switching to Claude, the author used subagents to directly inspect LLVM code, finding bugs at a rate of one every four minutes for AMDGPU and almost two per minute for x86, uncovering issues fuzzing might miss.
While fuzz bugs are demonstrable miscompiles, agent-found bugs vary in severity; one critical bug involved atomic stores being downgraded to non-atomic, risking silent data corruption.
Costs varied: fuzzing was relatively cheap using ChatGPT Pro, but code inspection with subagents cost over $10,000 in hours, though justified by finding high-impact bugs fuzzing might not catch.
The experience highlights that tasks once impossible are now feasible but expensive, widening the gap between those with and without budgets for AI resources, and raising questions about future value and accessibility.

Hasty Briefsbeta