Hasty Briefsbeta

Bilingual

Reasoning Models Reason Well, Until They Don't

6 months ago
  • #Reasoning
  • #Large Language Models
  • #Artificial Intelligence
  • Large language models (LLMs) show progress in reasoning tasks but fail at higher complexity.
  • Large reasoning models (LRMs) are fine-tuned for step-by-step reasoning and self-verification.
  • LRMs perform well on benchmarks like NLGraph but struggle with more complex problems.
  • A new dataset, Deep Reasoning Dataset (DeepRD), is introduced to evaluate scalable complexity.
  • LRMs' performance drops abruptly at sufficient complexity and lacks generalization.
  • Real-world knowledge graphs mostly fall within LRMs' success regime, but long tails reveal failure potential.
  • The study highlights LRMs' utility but calls for new methods to handle higher complexity.