Deep Dive into Yann LeCun's JEPA

9 months ago

Yann LeCun proposes the Joint Embedding Predictive Architecture (JEPA) as an alternative to current AI models like LLMs and Generative AI, which he critiques for lacking common sense, planning, and reasoning.
Current AI models, including LLMs, face issues like hallucinations, limited reasoning, and lack of long-term planning, which JEPA aims to address.
LeCun's framework for human-level AI includes components like the Configurator, Perception, World Model, Cost Module, and Actor, designed to mimic human learning and decision-making.
JEPA uses self-supervised learning and energy-based models to predict future states, focusing on representations rather than direct predictions to handle uncertainty.
Hierarchical JEPA (H-JEPA) introduces multiple levels of abstraction for short and long-term predictions, enhancing planning capabilities.
Recent implementations like I-JEPA, V-JEPA, and MC-JEPA explore JEPA's application in images, videos, and motion-content learning, showing promise in self-supervised learning.
V-JEPA 2 scales up model size and datasets, introducing post-training methods like progressive-resolution training, LLM conditioning, and action-conditioned post-training for robotics.
The ultimate goal is a V-JEPA model capable of long-term predictions and multimodal integration, advancing towards AGI.

Hasty Briefsbeta