Hasty Briefsbeta

Bilingual

Reasoning by Superposition: A Perspective on Chain of Continuous Thought

a year ago
  • #Reasoning
  • #Transformers
  • #Machine Learning
  • Large Language Models (LLMs) demonstrate strong performance in reasoning tasks using chain-of-thoughts (CoTs).
  • Continuous CoTs outperform discrete CoTs in reasoning tasks like directed graph reachability.
  • A two-layer transformer with continuous CoTs can solve directed graph reachability in D steps (graph diameter).
  • Discrete CoTs require O(n²) steps (n = vertices), making them less efficient.
  • Continuous CoTs encode multiple search frontiers as superposition states, enabling parallel BFS-like exploration.
  • Discrete CoTs follow a single path, leading to sequential search and potential local optima.
  • Experiments confirm that continuous CoTs naturally learn to explore multiple paths without explicit supervision.