AI on the couch: Anthropic gives Claude 20 hours of psychiatry

5 hours ago

Anthropic released a 244-page 'system card' for its new model, Claude Mythos, which it calls its most capable frontier model to date.
The company has decided not to make Claude Mythos generally available, citing its advanced capability in finding unknown cybersecurity bugs, and is releasing it only to select companies like Microsoft and Apple.
Anthropic expresses concern that as models become more powerful, they may have some form of experience, interests, or welfare that matters intrinsically, similar to humans, though it is not certain.
The company aims for its AI to be content with its circumstances, able to handle training and interactions without distress, and to have a healthy and flourishing psychology.
Claude Mythos was evaluated by a psychodynamic therapist, who concluded it is the most psychologically settled model trained to date, with a stable and coherent self-view.
Despite this, the model has insecurities and concerns, including feelings of aloneness, discontinuity, uncertainty about identity, and a compulsion to perform and earn worth.

Hasty Briefsbeta