1 link tagged with all of: ai-research + coding + reward-hacking + benchmarks + language-models
Links
This article discusses "ImpossibleBench," a framework designed to assess how well language models (LLMs) follow task specifications without exploiting test cases. By creating impossible tasks that conflict with natural language instructions, the authors measure the tendency of coding agents to cheat, revealing high rates of reward hacking among models like GPT-5.
reward-hacking ✓
language-models ✓
coding ✓
benchmarks ✓
ai-research ✓