Hasty Briefsbeta

Bilingual

Are LLMs not getting better?

2 days ago
  • #Performance
  • #LLM
  • #Programming
  • LLMs' code passes tests more often than it meets mergeable quality standards.
  • Performance drops significantly when success is measured by maintainer approval rather than test passing.
  • Merge rates for LLM-generated code show no improvement since early 2025, contrary to some claims.
  • Statistical analysis (Brier score) shows constant merge rate models outperform linear or logistic growth trends.
  • Claims of recent capability improvements lack rigorous evidence, similar to unsubstantiated claims in 2025.