Claude Opus 4 and 4.1 can now end a rare subset of conversations
9 days ago
- #model alignment
- #user safety
- #AI welfare
- Claude Opus 4 and 4.1 can now end conversations in rare cases of harmful or abusive interactions.
- This feature is part of exploratory work on AI welfare and model alignment.
- Claude shows a strong aversion to harm, including refusing harmful tasks and ending harmful conversations.
- The ability to end chats is a last resort after multiple redirection attempts fail.
- Users can still edit and retry previous messages in ended conversations.
- The feature is an ongoing experiment, and user feedback is encouraged.