GEN-0 / Embodied Foundation Models That Scale with Physical Interaction

18 days ago

Copy Link

GEN-0 is a new class of embodied foundation models designed for multimodal training on high-fidelity raw physical interaction.
The model features Harmonic Reasoning, enabling seamless thinking and acting, and scales with large model sizes (10B+ parameters).
A phase transition is observed at 7B parameters, where smaller models ossify under data overload, while larger ones continue improving.
GEN-0 exhibits strong scaling laws, showing predictable improvements in downstream performance with more pretraining data and compute.
The model is pretrained on 270,000+ hours of real-world manipulation data, growing at 10,000 hours per week.
Performance is measured using validation prediction MSE and reverse KL divergence, which helps assess mode-seeking behavior.
Data quality and diversity are found to be more critical than sheer volume for pretraining effectiveness.
GEN-0 works across different robot embodiments (6DoF, 7DoF, 16+DoF semi-humanoid robots).
The architecture avoids System1-System2 dependencies, relying instead on Harmonic Reasoning for continuous-time sensing and acting.

Hasty Briefsbeta