A Steerable Model with Emergent Capabilities

10 hours ago

#robotic foundation models
#compositional generalization
#multimodal prompting

π0.7 is a new general-purpose model that shows significant generalization, performing dexterous tasks at the same level as fine-tuned specialists and following new language commands for unseen tasks.
The model exhibits early signs of compositional generalization, recombining skills from different tasks to solve novel problems, such as using new kitchen appliances or enabling a new robot to fold laundry without specific training data.
Key to π0.7's broad generalization is the use of diverse data sources (e.g., from various robots, human data, autonomous episodes) enhanced with multimodal prompts that include language descriptions, metadata, control labels, and visual subgoals.
It can integrate diverse prompts, like step-by-step language coaching and synthetically generated visual subgoals, to improve task performance and enable learning from suboptimal data through annotations.
π0.7 demonstrates effective cross-platform generalization, successfully controlling different robotic systems (e.g., a bimanual UR5e for laundry folding) without specific training data, matching human teleoperator success rates.
The model achieves high success rates and throughput comparable to or better than specialist RL-trained models (like those from Recap) by distilling experience and strategy metadata into a single unified model.
π0.7's capabilities extend to a wide range of tasks, such as peeling vegetables and cleaning, showcasing its potential for real-world robotics applications and future advancements in semantic reasoning and physical generalization.

Hasty Briefsbeta

A Steerable Model with Emergent Capabilities