Anthropic's Responsible Scaling Policy: Version 3.0

7 hours ago

Anthropic releases Version 3.0 of its Responsible Scaling Policy (RSP) to mitigate AI risks.
The RSP uses AI Safety Levels (ASLs) to implement safeguards based on model capabilities.
Initial ASLs (ASL-2 and ASL-3) were detailed, while later ASLs (ASL-4 and beyond) were left undefined.
The RSP aimed to create internal accountability, encourage industry-wide safety standards, and build consensus on AI risks.
Successes include stronger safeguards, ASL-3 implementation, and influencing other companies and early AI policies.
Challenges include ambiguous capability thresholds, slow government action, and difficulties in unilateral risk mitigation.
The updated RSP separates company plans from industry recommendations, introduces a Frontier Safety Roadmap, and mandates Risk Reports with external review.
Risk Reports will provide detailed safety profiles and undergo third-party review to enhance transparency.
The RSP remains a living document, adaptable to evolving AI capabilities and risks.

Hasty Briefsbeta