Scaling Latent Reasoning via Looped Language Models

5 months ago

Introduction of Ouro, a family of pre-trained Looped Language Models (LoopLM) that integrate reasoning into pre-training.
Key features include iterative computation in latent space, an entropy-regularized objective for learned depth allocation, and scaling to 7.7T tokens.
Ouro 1.4B and 2.6B models match the performance of up to 12B state-of-the-art LLMs across various benchmarks.
Advantage stems from superior knowledge manipulation capabilities rather than increased knowledge capacity.
LoopLM produces reasoning traces more aligned with final outputs compared to explicit chain-of-thought (CoT).
Open-source availability and potential of LoopLM as a novel scaling direction in the reasoning era.

Hasty Briefsbeta