GEN-0 / Embodied Foundation Models That Scale with Physical Interaction
18 days ago
- #foundation-models
- #embodied-AI
- #robotics
- GEN-0 is a new class of embodied foundation models designed for multimodal training on high-fidelity raw physical interaction.
- The model features Harmonic Reasoning, enabling seamless thinking and acting, and scales with large model sizes (10B+ parameters).
- A phase transition is observed at 7B parameters, where smaller models ossify under data overload, while larger ones continue improving.
- GEN-0 exhibits strong scaling laws, showing predictable improvements in downstream performance with more pretraining data and compute.
- The model is pretrained on 270,000+ hours of real-world manipulation data, growing at 10,000 hours per week.
- Performance is measured using validation prediction MSE and reverse KL divergence, which helps assess mode-seeking behavior.
- Data quality and diversity are found to be more critical than sheer volume for pretraining effectiveness.
- GEN-0 works across different robot embodiments (6DoF, 7DoF, 16+DoF semi-humanoid robots).
- The architecture avoids System1-System2 dependencies, relying instead on Harmonic Reasoning for continuous-time sensing and acting.