Hasty Briefsbeta

Bilingual

A²RD: Agentic Autoregressive Diffusion for Long Video Consistency

10 hours ago
  • #video-synthesis
  • #long-video-generation
  • #AI-consistency
  • A²RD is an Agentic Auto-Regressive Diffusion architecture that decouples creative synthesis from consistency enforcement for long video generation.
  • It uses a Retrieve–Synthesize–Refine–Update cycle to synthesize and self-improve videos segment-by-segment, addressing semantic drift and narrative collapse.
  • Core components include Multimodal Video Memory, Adaptive Segment Generation, and Hierarchical Test-Time Self-Improvement to ensure visual consistency and coherence.
  • A training-free method, A²RD outperforms state-of-the-art baselines by up to 30% in consistency and 20% in narrative coherence on benchmarks.
  • LVBench-C is introduced as a challenging benchmark with non-linear entity and environment transitions to stress-test long-horizon consistency in videos.
  • Examples provided include single-scene and multi-scene narratives at 3-minute, 5-minute, and 10-minute scales, such as 'The Master Potter's Creation' and 'The Great Museum Heist'.