Gemini Robotics-ER 1.6

4 hours ago

Gemini Robotics-ER 1.6 is an upgraded reasoning-first model for robotics that enhances spatial reasoning and multi-view understanding for greater autonomy.
The model specializes in visual/spatial understanding, task planning, success detection, and can call tools like Google Search or VLAs.
Key improvements over previous versions include better pointing, counting, success detection, and new instrument reading capabilities for gauges and sight glasses.
Pointing capabilities enable spatial reasoning, relational logic, motion reasoning, and constraint compliance as intermediate steps for complex tasks.
Success detection allows robots to determine task completion and decide whether to retry or proceed, crucial for autonomy.
Instrument reading combines spatial reasoning and world knowledge to interpret complex gauges, aiding in facility inspections with partners like Boston Dynamics.
The model uses agentic vision, including zooming, pointing, and code execution, to achieve accurate instrument readings.
Gemini Robotics-ER 1.6 is the safest robotics model yet, with improved compliance with safety policies and hazard identification in text/video scenarios.
Developers can access the model via Gemini API and Google AI Studio, with a Colab provided for getting started.
Collaboration is encouraged; users can submit labeled images of failure modes to help improve future model capabilities.

Hasty Briefsbeta