- Introduction of Ouro, a family of pre-trained Looped Language Models (LoopLM) that integrate reasoning into pre-training.
- Key features include iterative computation in latent space, an entropy-regularized objective for learned depth allocation, and scaling to 7.7T tokens.
- Ouro 1.4B and 2.6B models match the performance of up to 12B state-of-the-art LLMs across various benchmarks.
- Advantage stems from superior knowledge manipulation capabilities rather than increased knowledge capacity.
- LoopLM produces reasoning traces more aligned with final outputs compared to explicit chain-of-thought (CoT).
- Open-source availability and potential of LoopLM as a novel scaling direction in the reasoning era.