LLM Doesn't Write Correct Code. It Writes Plausible Code

6 hours ago

LLMs generate plausible but often incorrect code, as shown by a Rust rewrite of SQLite being 20,171 times slower in primary key lookups.
The Rust rewrite misses critical optimizations like the `is_ipk` check for INTEGER PRIMARY KEY columns, leading to full table scans instead of efficient B-tree searches.
Another issue is unnecessary `fsync` calls on every INSERT statement, making operations significantly slower compared to SQLite's optimized approach.
LLMs optimize for plausibility over correctness, producing code that compiles and passes tests but fails under real-world performance scrutiny.
Studies like METR's randomized trial and GitClear's analysis show that AI-generated code often leads to slower development and more copy-pasted, less refactored code.
SQLite's performance comes from decades of optimization, including zero-copy page caching, prepared statement reuse, and schema cookie checks—details LLMs miss.
The gap between LLM-generated code and correct implementations highlights the need for developers to define and measure specific correctness criteria when using AI tools.

Hasty Briefsbeta