Netflix just dropped their first public model on Hugging Face: VOID

7 hours ago

VOID removes objects from videos along with all interactions they induce on the scene, including physical interactions like objects falling when a person is removed.
It is built on CogVideoX-Fun-V1.5-5b-InP, fine-tuned for video inpainting with interaction-aware quadmask conditioning, and requires models like void_pass1.safetensors for base inpainting.
Usage can be done via a provided notebook or CLI, with input requiring video, quadmask (generated by a pipeline using SAM2 + Gemini), and a text prompt describing the scene after removal.
Training utilized paired counterfactual videos from HUMOTO (human-object interactions in Blender) and Kubric (object-only interactions), run on 8x A100 80GB GPUs.

Hasty Briefsbeta