Gemini 2.5 Computer Use model
4 hours ago
- #AI
- #User Interface
- #Automation
- Gemini 2.5 Computer Use model is released, built on Gemini 2.5 Pro’s visual understanding and reasoning capabilities.
- The model enables agents to interact with user interfaces (UIs) for tasks like filling forms, clicking, and scrolling.
- It outperforms leading alternatives on web and mobile control benchmarks with lower latency.
- Inputs to the model include user request, screenshot, and action history, with optional exclusions or custom functions.
- The model operates in a loop: analyzes inputs, generates UI actions, executes them, and repeats until task completion.
- Optimized for web browsers, with potential for mobile UI control, but not yet for desktop OS-level tasks.
- Includes safety features to mitigate risks like misuse, unexpected behavior, and prompt injections.
- Developers can implement additional safety controls, such as per-step safety checks and system instructions.
- Early testers have used the model for UI testing, personal assistants, and workflow automation.
- Available in public preview via Gemini API on Google AI Studio and Vertex AI, with demos and documentation.