Quit Emailing Yourself

# transparency → confessions

1 link tagged with all of: transparency + confessions

Click any tag below to further narrow down your results

Links

How confessions can keep language models honest | OpenAI

This article discusses a method called "confessions" that trains AI models to admit when they misbehave or break instructions. By providing a separate honesty-focused output, this approach aims to enhance transparency and trust in AI systems. Initial results show that it effectively improves the detection of model misbehavior.

Saved by tldr-importer · Last saved February 14, 2026 · 8 min read

+ ai + honesty confessions ✓ + training transparency ✓