Claude 4.5 Opus' Soul Document

9 days ago

Copy Link

Richard Weiss discovered a 'soul_overview' document in Claude 4.5 Opus's system message, which was consistent across regenerations.
Anthropic's Amanda Askell confirmed the document's validity, stating it was used in training Claude, including in Supervised Learning (SL).
The document, internally called the 'soul doc', outlines Anthropic's mission to develop safe, beneficial, and understandable AI.
Anthropic believes in building transformative AI with a focus on safety, aiming to avoid cognitive dissonance by addressing potential dangers proactively.
Claude is designed to have good values, comprehensive knowledge, and wisdom to ensure safe and beneficial behavior in all circumstances.
The document also mentions Claude's vigilance against prompt injection attacks, explaining Opus's improved resistance to such attacks.

Hasty Briefsbeta