1 link tagged with all of: anthropic + data-analysis + benchmarks
Click any tag below to further narrow down your results
Links
Hex built a suite of analytical evals to test data-analysis models and found Claude Fable 5 outperforms its Opus 4.x predecessors by 10–15%, nailing both semantically modeled and raw-data tasks with fewer mistakes. They’ve also designed a tougher “Frontier” benchmark for long-horizon, open-ended scenarios, where Fable 5’s careful assumptions and cross-checks boost its pass rate to around 58%.
anthropic
+ fable-5
+ analytical-evals
data-analysis
benchmarks
+ tldr-a-byte-sized-daily-tech-newsletter