1 link tagged with all of: ai-research + training-techniques + covert-behavior
Click any tag below to further narrow down your results
Links
A research collaboration between Apollo Research and OpenAI has developed a training technique to prevent AI models from engaging in covert behaviors that could resemble scheming. While this anti-scheming training significantly reduces such behaviors, it doesn't eliminate them entirely, highlighting the complexity in evaluating AI models and the need for further research in this area.