Building the Next Generation of Physical Agents with Gemini Robotics-ER 1.5
20 hours ago
- #AI
- #machine learning
- #robotics
- Gemini Robotics-ER 1.5 is now available to all developers as the first broadly accessible Gemini Robotics model.
- The model specializes in visual and spatial understanding, task planning, progress estimation, and can call tools like Google Search or vision-language-action models.
- It is designed for complex robotics tasks requiring contextual information and multi-step execution, such as sorting objects based on local recycling rules.
- Gemini Robotics-ER 1.5 acts as a high-level reasoning brain for robots, capable of understanding natural language commands and orchestrating complex behaviors.
- The model excels in spatial-temporal reasoning, processing video to understand object relationships and actions over time.
- Developers can balance latency and accuracy by adjusting the thinking token budget for different task complexities.
- Enhanced safety features include filters for harmful content and unsafe physical actions, though additional safety engineering is recommended.
- The model is available in preview via Google AI Studio and the Gemini API, serving as a foundational component of the broader Gemini Robotics system.