Reasoning models, which utilize extended chain-of-thought (CoT) reasoning, demonstrate enhanced performance in both problem-solving and accurately expressing confidence compared to non-reasoning models. This study benchmarks six reasoning models across various datasets, revealing that their slow thinking behaviors facilitate better confidence calibration. The findings indicate that even non-reasoning models can improve calibration when guided towards slow thinking techniques.
reasoning-models ✓
confidence-calibration ✓
chain-of-thought ✓
artificial-intelligence ✓
machine-learning ✓