Apple Releases Open Weights Video Model

9 days ago

Copy Link

STARFlow-V is the first normalizing flow-based causal video generator, matching video diffusion models in quality.
It offers end-to-end training, exact likelihood estimation, and native multi-task support for T2V/I2V/V2V generation.
The model uses a global-local architecture to separate global temporal reasoning from local within-frame details.
Flow-Score Matching Denoising combines normalizing flow maximum likelihood with flow-score matching for improved denoising.
Video-Aware Jacobi Iteration enables block-wise parallel updates for efficient generation while maintaining causality.
STARFlow-V is trained on 70M text-video pairs and 400M text-image pairs, resulting in a 7B parameter model.
The model supports text-to-video, image-to-video, and video-to-video generation tasks without architectural changes.
Empirical results show strong visual fidelity and temporal consistency compared to diffusion-based baselines.
Failure cases include complex motion and physical interactions due to insufficient training and low-quality data.

Hasty Briefsbeta