1 link tagged with all of: benchmarks + models + reasoning + sudoku + ai
Links
Sakana AI's Sudoku-Bench tests AI reasoning with handcrafted sudoku puzzles. GPT-5 has achieved a 33% solve rate, outperforming previous models but still struggling with complex puzzles. The article explores the limitations of current AI reasoning methods and emphasizes the need for further research.
sudoku ✓
ai ✓
reasoning ✓
benchmarks ✓
models ✓