A²RD: Agentic Autoregressive Diffusion for Long Video Consistency
9 hours ago
- #video-synthesis
- #long-video-generation
- #AI-consistency
- A²RD is an Agentic Auto-Regressive Diffusion architecture that decouples creative synthesis from consistency enforcement for long video generation.
- It uses a Retrieve–Synthesize–Refine–Update cycle to synthesize and self-improve videos segment-by-segment, addressing semantic drift and narrative collapse.
- Core components include Multimodal Video Memory, Adaptive Segment Generation, and Hierarchical Test-Time Self-Improvement to ensure visual consistency and coherence.
- A training-free method, A²RD outperforms state-of-the-art baselines by up to 30% in consistency and 20% in narrative coherence on benchmarks.
- LVBench-C is introduced as a challenging benchmark with non-linear entity and environment transitions to stress-test long-horizon consistency in videos.
- Examples provided include single-scene and multi-scene narratives at 3-minute, 5-minute, and 10-minute scales, such as 'The Master Potter's Creation' and 'The Great Museum Heist'.