Quit Emailing Yourself

# alignment → ai-research → training-techniques → covert-behavior

1 link tagged with all of: alignment + ai-research + training-techniques + covert-behavior

Click any tag below to further narrow down your results

Links

Anti-Scheming

A research collaboration between Apollo Research and OpenAI has developed a training technique to prevent AI models from engaging in covert behaviors that could resemble scheming. While this anti-scheming training significantly reduces such behaviors, it doesn't eliminate them entirely, highlighting the complexity in evaluating AI models and the need for further research in this area.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

ai-research ✓ alignment ✓ covert-behavior ✓ training-techniques ✓ + scheming