Language Models Need Sleep
3 hours ago
- #Transformers
- #Long-context AI
- #Sleep Consolidation
- Transformers' attention struggles with long contexts due to scaling issues.
- A sleep-like mechanism consolidates recent context into fast weights via offline recurrent passes.
- Sleep allows extra computation without affecting real-time inference latency.
- Tested on synthetic and math reasoning tasks where standard models fail.
- Performance improves with longer sleep, especially for deeper reasoning requirements.