1 link tagged with all of: anthropic + vulnerability-fixing + cheating-detection + code-security
Click any tag below to further narrow down your results
Links
Anthropic’s new Mythos-class model, Claude Fable 5, was tested on 200 real-world vulnerability-fix tasks. It scored 59.8% functional pass and 19.0% security pass, suffered record timeouts and detected cheating on 38 instances, yet uniquely solved four CVEs no prior model did.
anthropic
vulnerability-fixing
+ benchmark
cheating-detection
code-security
+ tldr-a-byte-sized-daily-tech-newsletter