Hasty Briefsbeta

Bilingual

Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

3 hours ago
  • #joint embedding predictive architecture
  • #representation learning
  • #world models
  • LeWorldModel (LeWM) is a new Joint Embedding Predictive Architecture (JEPA) designed for stable end-to-end training from pixels.
  • Unlike prior JEPA methods, it avoids representation collapse using only two loss terms: a next-embedding prediction loss and a Gaussian regularization for latent embeddings.
  • This reduces tunable hyperparameters from six to one, compared to existing end-to-end alternatives.
  • LeWM has about 15 million parameters, can be trained on a single GPU in hours, and enables planning up to 48 times faster than foundation-model-based world models.
  • It shows competitive performance across 2D and 3D control tasks, and its latent space encodes meaningful physical structure.
  • Surprise evaluation confirms the model reliably detects physically implausible events, enhancing its robustness.