Next-Latent Prediction Transformers Learn Compact World Models (2025)
a day ago
- #latent-prediction
- #transformers
- #world-modeling
- The paper introduces Next-Latent Prediction (NextLat), an auxiliary training objective for transformers that adds self-supervised predictions in the latent space.
- NextLat encourages transformers to form compact internal world models with belief states and transition dynamics, improving generalization over standard next-token prediction.
- Theoretically, NextLat's latent representations converge to belief states, which are compressed summaries of history necessary for future predictions.
- Empirically, NextLat shows gains in benchmarks like world modeling, reasoning, planning, and language modeling, enhancing accuracy, compression, and planning.
- NextLat also enables variable-length self-speculative decoding, accelerating inference by up to 3.3x in language modeling while leaving transformer architecture and efficiency unchanged.