Waypoint-1: Real-Time Interactive Video Diffusion from Overworld
16 days ago
- #VideoDiffusion
- #AI
- #RealTime
- Waypoint-1 is Overworld’s real-time-interactive video diffusion model, controllable via text, mouse, and keyboard.
- It allows users to create interactive worlds by generating frames based on inputs.
- The model is trained on 10,000 hours of video game footage with control inputs and text captions.
- Unlike other models, Waypoint-1 offers zero-latency control inputs, enabling free camera movement and keyboard inputs.
- Training involves diffusion forcing and self-forcing techniques to improve frame-by-frame generation.
- WorldEngine is Overworld’s high-performance inference library optimized for low latency and interactivity.
- Waypoint-1-Small (2.3B) achieves 30 FPS at 4 steps or 60 FPS at 2 steps on a 5090 GPU.
- Performance optimizations include AdaLN feature caching, static rolling KV cache, matmul fusion, and Torch Compile.
- A hackathon for World Engine is scheduled for 1/20/2026, with a 5090 GPU prize.