LLM Doesn't Write Correct Code. It Writes Plausible Code
6 hours ago
- #LLM-generated-code
- #performance-optimization
- #SQLite
- LLMs generate plausible but often incorrect code, as shown by a Rust rewrite of SQLite being 20,171 times slower in primary key lookups.
- The Rust rewrite misses critical optimizations like the `is_ipk` check for INTEGER PRIMARY KEY columns, leading to full table scans instead of efficient B-tree searches.
- Another issue is unnecessary `fsync` calls on every INSERT statement, making operations significantly slower compared to SQLite's optimized approach.
- LLMs optimize for plausibility over correctness, producing code that compiles and passes tests but fails under real-world performance scrutiny.
- Studies like METR's randomized trial and GitClear's analysis show that AI-generated code often leads to slower development and more copy-pasted, less refactored code.
- SQLite's performance comes from decades of optimization, including zero-copy page caching, prepared statement reuse, and schema cookie checks—details LLMs miss.
- The gap between LLM-generated code and correct implementations highlights the need for developers to define and measure specific correctness criteria when using AI tools.