Building the Next Generation of Physical Agents with Gemini Robotics-ER 1.5

18 hours ago

Copy Link

Gemini Robotics-ER 1.5 is now available to all developers as the first broadly accessible Gemini Robotics model.
The model specializes in visual and spatial understanding, task planning, progress estimation, and can call tools like Google Search or vision-language-action models.
It is designed for complex robotics tasks requiring contextual information and multi-step execution, such as sorting objects based on local recycling rules.
Gemini Robotics-ER 1.5 acts as a high-level reasoning brain for robots, capable of understanding natural language commands and orchestrating complex behaviors.
The model excels in spatial-temporal reasoning, processing video to understand object relationships and actions over time.
Developers can balance latency and accuracy by adjusting the thinking token budget for different task complexities.
Enhanced safety features include filters for harmful content and unsafe physical actions, though additional safety engineering is recommended.
The model is available in preview via Google AI Studio and the Gemini API, serving as a foundational component of the broader Gemini Robotics system.

Hasty Briefsbeta