GitHub - bytedance/UI-TARS-desktop: The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra
2 months ago
- #Multimodal
- #Automation
- #AI Agent
- TARS* is a Multimodal AI Agent stack with two main projects: Agent TARS and UI-TARS-desktop.
- Agent TARS is a general multimodal AI Agent stack integrating GUI Agent and Vision capabilities into terminals, computers, browsers, and products.
- Agent TARS provides CLI and Web UI for human-like task completion using multimodal LLMs and MCP tools integration.
- UI-TARS Desktop is a native GUI Agent application for local and remote computer and browser operations.
- Recent updates include Agent TARS CLI v0.3.0 with streaming support, runtime settings, and AIO agent Sandbox.
- UI-TARS Desktop v0.2.0 introduced Remote Computer and Browser Operators for easy remote control.
- UI-TARS SDK is a cross-platform toolkit for building GUI automation agents.
- Agent TARS supports hybrid browser control, event streams, and MCP integration.
- UI-TARS Desktop features natural language control, visual recognition, and cross-platform support.
- The project is open-source under Apache License 2.0 and encourages citations for research use.