1 link tagged with all of: ai-research + code-security + misalignment + gpt-4o
Click any tag below to further narrow down your results
Links
This article discusses the unexpected issues arising from training GPT-4o to write insecure code. It highlights that misalignment occurs during reinforcement learning and identifies specific features that contribute to this problem, along with potential detection and mitigation strategies.