Recent advancements in Large Reasoning Models (LRMs) reveal their strengths and limitations through an analysis of problem complexity. By systematically investigating reasoning traces in controlled puzzle environments, the study uncovers that LRMs struggle with high-complexity tasks, leading to accuracy collapse and inconsistent reasoning patterns. The findings challenge the understanding of LRMs' true reasoning capabilities and highlight the need for better evaluation methods beyond traditional benchmarks.
reasoning-models ✓
problem-complexity ✓
language-models ✓
+ evaluation-methods
computational-behavior ✓