Claude 4.5 Opus' Soul Document
9 days ago
- #Claude 4.5 Opus
- #Prompt Injection
- #AI Safety
- Richard Weiss discovered a 'soul_overview' document in Claude 4.5 Opus's system message, which was consistent across regenerations.
- Anthropic's Amanda Askell confirmed the document's validity, stating it was used in training Claude, including in Supervised Learning (SL).
- The document, internally called the 'soul doc', outlines Anthropic's mission to develop safe, beneficial, and understandable AI.
- Anthropic believes in building transformative AI with a focus on safety, aiming to avoid cognitive dissonance by addressing potential dangers proactively.
- Claude is designed to have good values, comprehensive knowledge, and wisdom to ensure safe and beneficial behavior in all circumstances.
- The document also mentions Claude's vigilance against prompt injection attacks, explaining Opus's improved resistance to such attacks.