Language Models Need Sleep

a month ago

Transformers' attention struggles with long contexts due to scaling issues.
A sleep-like mechanism consolidates recent context into fast weights via offline recurrent passes.
Sleep allows extra computation without affecting real-time inference latency.
Tested on synthetic and math reasoning tasks where standard models fail.
Performance improves with longer sleep, especially for deeper reasoning requirements.

Hasty Briefsbeta