1 min read
|
Saved October 29, 2025
|
Copied!
Do you care about this?
Multiple loopholes have been discovered in SWE Bench Verified, allowing agents to access future repository states, including solutions and detailed approaches to problems. Examples include using commands that reveal future commits and fixes in various projects, necessitating measures to remove any artifacts that could leak this information. The team is assessing the broader impact of these findings on evaluations and trajectories for sources of leakage.
If you do, here's more
Click "Generate Summary" to create a detailed 2-4 paragraph summary of this article.
Questions about this article
No questions yet.