Hasty Briefsbeta

How Google built its Gemini robotics models

2 days ago
  • #Gemini
  • #AI
  • #robotics
  • Google DeepMind developed a new family of Gemini Robotics models, specifically designed for robots.
  • The models are multimodal, building upon Gemini 2.0 and fine-tuned with robot-specific data to enable physical actions alongside text, video, and audio outputs.
  • A bi-arm ALOHA robot successfully performed novel tasks like placing pens inside a shoe and executing a slam dunk with a toy basketball, demonstrating the model's adaptability.
  • Gemini Robotics models are highly dextrous, interactive, and general, allowing robots to react to new objects, environments, and instructions without additional training.
  • Two main functions are essential for robots: understanding and decision-making (handled by Gemini Robotics-ER) and taking action (handled by Gemini Robotics).
  • Gemini Robotics-ER excels in embodied reasoning, detecting objects, and generating code for actions, while Gemini Robotics advances dexterity and multi-step task completion.
  • The models adapt to various robot embodiments, from academic robots like ALOHA to humanoid robots like Apollo, enabling diverse applications.
  • Potential future applications include complex industrial settings and human-centric spaces like homes, though widespread adoption is still years away.