Hasty Briefsbeta

Bilingual

Wan Streamer v0.1: End-to-End Real-Time Interactive Foundation Models

8 hours ago
  • #Foundation Model
  • #Multimodal AI
  • #Real-time Interaction
  • Wan Streamer is an end-to-end, native-streaming interactive foundation model designed for real-time, low-latency, full-duplex audio-visual interaction.
  • It models language, audio, and video as both input and output within a single Transformer, using block-causal attention for incremental streaming.
  • The system achieves about 200 ms model-side response latency and 550 ms total interaction latency with network delay, supporting sub-second communication.
  • Unlike cascaded pipelines, Wan Streamer integrates perception, reasoning, and generation in one model, avoiding delays and synchronization issues.
  • It features a thinker-performer deployment pipeline across two GPUs to maximize overlap and maintain real-time throughput.