Hasty Briefsbeta

Bilingual

Why eval startups fail (2025)

2 days ago
  • #AI evaluation
  • #startup challenges
  • #benchmark gaming
  • Eval startups often fail due to talent attrition, as skilled evaluators can earn more and gain greater influence in other areas like post-training or application development.
  • The market for independent eval startups is limited, as their target customers must be technical developers using APIs but not technical enough to run their own evals—a small overlap.
  • Eval startups face significant optimization pressure from large AI labs that game public benchmarks, making evals less reliable due to Goodhart's Law.
  • Safety eval startups are an exception because they attract ideologically driven talent, serve technical clients needing external validation, and may benefit from regulatory demands.
  • Startups selling research evals to big labs are likely to fail because labs won't outsource setting their research agenda, and outsourcing adds latency to model iteration.