Our first outage from LLM-written code
9 months ago
- #LLM
- #Outage
- #CodeReview
- A series of mini-outages at sketch.dev on July 15th were caused by LLM-written code.
- Initial deployment seemed stable, but later CPU spiked, leading to service slowdowns due to complex SQL queries.
- The issue was traced back to a refactored code path, where a 'break' was mistakenly changed to 'continue', causing infinite loops.
- The error occurred during a code move by an LLM, which introduced a transcription error despite human review.
- Prevention measures include adding clipboard support to Sketch's agent environment for more accurate code transcription.
- The incident highlights the need for better tooling, like git cross-hunk change detection, to catch such errors.