Anthropic's definition of safety is too narrow

7 hours ago

Anthropic faced trust issues in April due to product reliability, pricing changes, and poor communication, not just model safety concerns.
Incidents like Claude Code quality degradation and a pricing experiment removed from Pro plans eroded developer trust, shifting public sentiment from positive to cynical.
Anthropic demonstrated discipline by restricting access to Mythos for safety reasons, but failed to apply similar rigor to product and pricing decisions.
A postmortem on Claude Code identified issues like a default reasoning effort drop and bugs, but lacked systemic process analysis, mixing product judgment with technical bugs.
Pricing communication treated developers as A/B test cohorts, causing confusion and anger, and highlighting adoption risks for companies relying on stable toolchain costs.
There's a division between model safety (handled by research) and downstream actions (like product and growth), with trust being spent in areas outside traditional safety boundaries.
Trust is not compartmentalized; incidents in product or pricing reduce credibility in safety claims, affecting Anthropic's brand promise as a trustworthy AI lab.
Safety should extend beyond model behavior to include evals, staged rollouts, stable pricing, and clear communication as essential for maintaining trust in real-world applications.

Hasty Briefsbeta