Hasty Briefsbeta

Bilingual

The Illusion of the Illusion of Thinking – A Comment on Shojaee et al. (2025)

a year ago
  • #Experimental Design
  • #Reasoning Models
  • #Artificial Intelligence
  • Shojaee et al. (2025) report 'accuracy collapse' in Large Reasoning Models (LRMs) on complex planning puzzles.
  • The study identifies three experimental design limitations affecting the reported findings:
  • 1. Tower of Hanoi experiments exceed model output token limits, with models acknowledging constraints.
  • 2. Automated evaluation misclassifies reasoning failures vs. practical constraints.
  • 3. River Crossing benchmarks include unsolvable problems, yet models are scored as failures.
  • When controlling for these artifacts, models show high accuracy on previously failed Tower of Hanoi instances.
  • Highlights the importance of careful experimental design in evaluating AI reasoning capabilities.