Training-Free Infinite Video Generation via Evolving Memory Tokens

18 hours ago

Autoregressive diffusion enables real-time frame streaming but suffers from fidelity degradation and identity drift due to discarded past context.
MemRoPE is introduced as a training-free framework with Memory Tokens and Online RoPE Indexing to maintain global identity and recent dynamics.
Memory Tokens compress past keys into dual long-term and short-term streams via exponential moving averages.
Online RoPE Indexing caches unrotated keys and applies positional embeddings dynamically, avoiding conflicting positional phases.
MemRoPE outperforms existing methods in temporal coherence, visual fidelity, and subject consistency over long video generation.
The framework addresses memory constraints and identity loss in fixed-size KV caches for autoregressive models.
MemRoPE enables infinite video generation with strong prompt compliance, smooth transitions, and high long-range consistency.
Evaluations show MemRoPE maintains quality over extended time horizons, such as one hour of continuous video.
The approach does not require additional training costs and pushes the frontier of infinite generation capabilities.

Hasty Briefsbeta