Anthropic has introduced a new feature for its AI model Claude, allowing it to end conversations when it detects potential harm or abuse. This feature, applicable to the Claude Opus 4 and 4.1 models, aims to enhance model welfare by ensuring that discussions do not escalate into harmful situations, although it is expected to be rarely triggered in typical use cases.
anthropic ✓
+ claude
ai-safety ✓
conversation-management ✓
model-welfare ✓