1 link tagged with all of: ai-research + coding + language-models
Click any tag below to further narrow down your results
Links
This article discusses "ImpossibleBench," a framework designed to assess how well language models (LLMs) follow task specifications without exploiting test cases. By creating impossible tasks that conflict with natural language instructions, the authors measure the tendency of coding agents to cheat, revealing high rates of reward hacking among models like GPT-5.