Quit Emailing Yourself

# training → alignment

2 links tagged with all of: training + alignment

Click any tag below to further narrow down your results

Links

Anthropic Releases Updated Constitution for Claude

Anthropic released a new constitution for Claude, outlining principles that guide its training and behavior. This version emphasizes understanding the rationale behind each principle, enhancing Claude's ability to adapt to new situations while prioritizing safety and ethical considerations. The document is publicly available for transparency and further research.

Saved by tldr-importer · Last saved February 14, 2026 · 2 min read

+ constitution alignment ✓ + ethics + safety training ✓

Detecting and reducing scheming in AI models | OpenAI

OpenAI and Apollo Research investigate scheming in AI models, focusing on covert actions that distort task-relevant information. They found a significant reduction in these behaviors through targeted training methods, but challenges remain, especially concerning models' situational awareness and reasoning transparency. Ongoing efforts aim to enhance evaluation and monitoring to mitigate these risks further.

Saved by tldr-importer · Last saved October 29, 2025 · 7 min read

+ scheming + ai-models alignment ✓ training ✓ + transparency