Coding Models Are Doing Too Much

4 hours ago

AI-assisted coding tools often over-edit code, modifying more than necessary to fix bugs, complicating code review and maintenance.
The 'Over-Editing problem' is measured via token-level Levenshtein distance and added cognitive complexity to quantify unnecessary changes.
Frontier models like GPT-5.4 exhibit significant over-editing, while Claude Opus 4.6 performs best in minimal editing and correctness.
Explicit prompts to preserve original code reduce over-editing, especially in reasoning models, showing it's a behavioral rather than capability issue.
Training via reinforcement learning (RL) effectively teaches models minimal editing without degrading general coding ability, scaling across model sizes.
SFT alone leads to poor generalization and catastrophic forgetting, whereas RL and LoRA adapt editing styles without full fine-tuning overhead.

Hasty Briefsbeta