Reasoning by Superposition: A Perspective on Chain of Continuous Thought
a year ago
- #Reasoning
- #Transformers
- #Machine Learning
- Large Language Models (LLMs) demonstrate strong performance in reasoning tasks using chain-of-thoughts (CoTs).
- Continuous CoTs outperform discrete CoTs in reasoning tasks like directed graph reachability.
- A two-layer transformer with continuous CoTs can solve directed graph reachability in D steps (graph diameter).
- Discrete CoTs require O(n²) steps (n = vertices), making them less efficient.
- Continuous CoTs encode multiple search frontiers as superposition states, enabling parallel BFS-like exploration.
- Discrete CoTs follow a single path, leading to sequential search and potential local optima.
- Experiments confirm that continuous CoTs naturally learn to explore multiple paths without explicit supervision.