Hasty Briefsbeta

Claude 4.5 Opus' Soul Document

9 days ago
  • #Claude 4.5 Opus
  • #Prompt Injection
  • #AI Safety
  • Richard Weiss discovered a 'soul_overview' document in Claude 4.5 Opus's system message, which was consistent across regenerations.
  • Anthropic's Amanda Askell confirmed the document's validity, stating it was used in training Claude, including in Supervised Learning (SL).
  • The document, internally called the 'soul doc', outlines Anthropic's mission to develop safe, beneficial, and understandable AI.
  • Anthropic believes in building transformative AI with a focus on safety, aiming to avoid cognitive dissonance by addressing potential dangers proactively.
  • Claude is designed to have good values, comprehensive knowledge, and wisdom to ensure safe and beneficial behavior in all circumstances.
  • The document also mentions Claude's vigilance against prompt injection attacks, explaining Opus's improved resistance to such attacks.