V-JEPA 2

10 months ago

V-JEPA 2 is a state-of-the-art world model trained on video for visual understanding and prediction.
It enables zero-shot robot control in new environments without extensive training data.
The model excels in motion understanding, visual reasoning, and anticipating actions from contextual cues.
V-JEPA 2 uses a two-phase training approach: self-supervised learning from visual data and fine-tuning on robot data.
It was trained on 62 hours of robot data from the Droid dataset and can perform tasks like reaching, grasping, and pick-and-place.
Potential applications include robotic assistants for household chores and wearable assistants for real-time hazard alerts.
Meta is releasing V-JEPA 2 for the community to build upon, expecting it to power novel experiences across diverse domains.

Hasty Briefsbeta