Hasty Briefsbeta

Apple Releases Open Weights Video Model

9 days ago
  • #normalizing-flows
  • #autoregressive-models
  • #video-generation
  • STARFlow-V is the first normalizing flow-based causal video generator, matching video diffusion models in quality.
  • It offers end-to-end training, exact likelihood estimation, and native multi-task support for T2V/I2V/V2V generation.
  • The model uses a global-local architecture to separate global temporal reasoning from local within-frame details.
  • Flow-Score Matching Denoising combines normalizing flow maximum likelihood with flow-score matching for improved denoising.
  • Video-Aware Jacobi Iteration enables block-wise parallel updates for efficient generation while maintaining causality.
  • STARFlow-V is trained on 70M text-video pairs and 400M text-image pairs, resulting in a 7B parameter model.
  • The model supports text-to-video, image-to-video, and video-to-video generation tasks without architectural changes.
  • Empirical results show strong visual fidelity and temporal consistency compared to diffusion-based baselines.
  • Failure cases include complex motion and physical interactions due to insufficient training and low-quality data.