Gemini Robotics-ER 1.6
4 hours ago
- #AI Reasoning
- #Robotics
- #Autonomous Systems
- Gemini Robotics-ER 1.6 is an upgraded reasoning-first model for robotics that enhances spatial reasoning and multi-view understanding for greater autonomy.
- The model specializes in visual/spatial understanding, task planning, success detection, and can call tools like Google Search or VLAs.
- Key improvements over previous versions include better pointing, counting, success detection, and new instrument reading capabilities for gauges and sight glasses.
- Pointing capabilities enable spatial reasoning, relational logic, motion reasoning, and constraint compliance as intermediate steps for complex tasks.
- Success detection allows robots to determine task completion and decide whether to retry or proceed, crucial for autonomy.
- Instrument reading combines spatial reasoning and world knowledge to interpret complex gauges, aiding in facility inspections with partners like Boston Dynamics.
- The model uses agentic vision, including zooming, pointing, and code execution, to achieve accurate instrument readings.
- Gemini Robotics-ER 1.6 is the safest robotics model yet, with improved compliance with safety policies and hazard identification in text/video scenarios.
- Developers can access the model via Gemini API and Google AI Studio, with a Colab provided for getting started.
- Collaboration is encouraged; users can submit labeled images of failure modes to help improve future model capabilities.