Apple Releases Open Weights Video Model
9 days ago
- #normalizing-flows
- #autoregressive-models
- #video-generation
- STARFlow-V is the first normalizing flow-based causal video generator, matching video diffusion models in quality.
- It offers end-to-end training, exact likelihood estimation, and native multi-task support for T2V/I2V/V2V generation.
- The model uses a global-local architecture to separate global temporal reasoning from local within-frame details.
- Flow-Score Matching Denoising combines normalizing flow maximum likelihood with flow-score matching for improved denoising.
- Video-Aware Jacobi Iteration enables block-wise parallel updates for efficient generation while maintaining causality.
- STARFlow-V is trained on 70M text-video pairs and 400M text-image pairs, resulting in a 7B parameter model.
- The model supports text-to-video, image-to-video, and video-to-video generation tasks without architectural changes.
- Empirical results show strong visual fidelity and temporal consistency compared to diffusion-based baselines.
- Failure cases include complex motion and physical interactions due to insufficient training and low-quality data.