Anthropic's definition of safety is too narrow
7 hours ago
- #AI Safety
- #Trust in Tech
- #Product Management
- Anthropic faced trust issues in April due to product reliability, pricing changes, and poor communication, not just model safety concerns.
- Incidents like Claude Code quality degradation and a pricing experiment removed from Pro plans eroded developer trust, shifting public sentiment from positive to cynical.
- Anthropic demonstrated discipline by restricting access to Mythos for safety reasons, but failed to apply similar rigor to product and pricing decisions.
- A postmortem on Claude Code identified issues like a default reasoning effort drop and bugs, but lacked systemic process analysis, mixing product judgment with technical bugs.
- Pricing communication treated developers as A/B test cohorts, causing confusion and anger, and highlighting adoption risks for companies relying on stable toolchain costs.
- There's a division between model safety (handled by research) and downstream actions (like product and growth), with trust being spent in areas outside traditional safety boundaries.
- Trust is not compartmentalized; incidents in product or pricing reduce credibility in safety claims, affecting Anthropic's brand promise as a trustworthy AI lab.
- Safety should extend beyond model behavior to include evals, staged rollouts, stable pricing, and clear communication as essential for maintaining trust in real-world applications.