Training-Free Infinite Video Generation via Evolving Memory Tokens
18 hours ago
- #memory-optimization
- #autoregressive-models
- #video-generation
- Autoregressive diffusion enables real-time frame streaming but suffers from fidelity degradation and identity drift due to discarded past context.
- MemRoPE is introduced as a training-free framework with Memory Tokens and Online RoPE Indexing to maintain global identity and recent dynamics.
- Memory Tokens compress past keys into dual long-term and short-term streams via exponential moving averages.
- Online RoPE Indexing caches unrotated keys and applies positional embeddings dynamically, avoiding conflicting positional phases.
- MemRoPE outperforms existing methods in temporal coherence, visual fidelity, and subject consistency over long video generation.
- The framework addresses memory constraints and identity loss in fixed-size KV caches for autoregressive models.
- MemRoPE enables infinite video generation with strong prompt compliance, smooth transitions, and high long-range consistency.
- Evaluations show MemRoPE maintains quality over extended time horizons, such as one hour of continuous video.
- The approach does not require additional training costs and pushes the frontier of infinite generation capabilities.