Hasty Briefsbeta

Bilingual

Gemini Robotics-ER 1.6

4 hours ago
  • #AI Reasoning
  • #Robotics
  • #Autonomous Systems
  • Gemini Robotics-ER 1.6 is an upgraded reasoning-first model for robotics that enhances spatial reasoning and multi-view understanding for greater autonomy.
  • The model specializes in visual/spatial understanding, task planning, success detection, and can call tools like Google Search or VLAs.
  • Key improvements over previous versions include better pointing, counting, success detection, and new instrument reading capabilities for gauges and sight glasses.
  • Pointing capabilities enable spatial reasoning, relational logic, motion reasoning, and constraint compliance as intermediate steps for complex tasks.
  • Success detection allows robots to determine task completion and decide whether to retry or proceed, crucial for autonomy.
  • Instrument reading combines spatial reasoning and world knowledge to interpret complex gauges, aiding in facility inspections with partners like Boston Dynamics.
  • The model uses agentic vision, including zooming, pointing, and code execution, to achieve accurate instrument readings.
  • Gemini Robotics-ER 1.6 is the safest robotics model yet, with improved compliance with safety policies and hazard identification in text/video scenarios.
  • Developers can access the model via Gemini API and Google AI Studio, with a Colab provided for getting started.
  • Collaboration is encouraged; users can submit labeled images of failure modes to help improve future model capabilities.