Why eval startups fail (2025)

2 days ago

Eval startups often fail due to talent attrition, as skilled evaluators can earn more and gain greater influence in other areas like post-training or application development.
The market for independent eval startups is limited, as their target customers must be technical developers using APIs but not technical enough to run their own evals—a small overlap.
Eval startups face significant optimization pressure from large AI labs that game public benchmarks, making evals less reliable due to Goodhart's Law.
Safety eval startups are an exception because they attract ideologically driven talent, serve technical clients needing external validation, and may benefit from regulatory demands.
Startups selling research evals to big labs are likely to fail because labs won't outsource setting their research agenda, and outsourcing adds latency to model iteration.

Hasty Briefsbeta