Hasty Briefsbeta

Bilingual

Waypoint-1: Real-Time Interactive Video Diffusion from Overworld

2 months ago
  • #VideoDiffusion
  • #AI
  • #RealTime
  • Waypoint-1 is Overworld’s real-time-interactive video diffusion model, controllable via text, mouse, and keyboard.
  • It allows users to create interactive worlds by generating frames based on inputs.
  • The model is trained on 10,000 hours of video game footage with control inputs and text captions.
  • Unlike other models, Waypoint-1 offers zero-latency control inputs, enabling free camera movement and keyboard inputs.
  • Training involves diffusion forcing and self-forcing techniques to improve frame-by-frame generation.
  • WorldEngine is Overworld’s high-performance inference library optimized for low latency and interactivity.
  • Waypoint-1-Small (2.3B) achieves 30 FPS at 4 steps or 60 FPS at 2 steps on a 5090 GPU.
  • Performance optimizations include AdaLN feature caching, static rolling KV cache, matmul fusion, and Torch Compile.
  • A hackathon for World Engine is scheduled for 1/20/2026, with a 5090 GPU prize.