Gemini 2.5 Computer Use model

4 hours ago

Copy Link

Gemini 2.5 Computer Use model is released, built on Gemini 2.5 Pro’s visual understanding and reasoning capabilities.
The model enables agents to interact with user interfaces (UIs) for tasks like filling forms, clicking, and scrolling.
It outperforms leading alternatives on web and mobile control benchmarks with lower latency.
Inputs to the model include user request, screenshot, and action history, with optional exclusions or custom functions.
The model operates in a loop: analyzes inputs, generates UI actions, executes them, and repeats until task completion.
Optimized for web browsers, with potential for mobile UI control, but not yet for desktop OS-level tasks.
Includes safety features to mitigate risks like misuse, unexpected behavior, and prompt injections.
Developers can implement additional safety controls, such as per-step safety checks and system instructions.
Early testers have used the model for UI testing, personal assistants, and workflow automation.
Available in public preview via Gemini API on Google AI Studio and Vertex AI, with demos and documentation.

Hasty Briefsbeta