Qodo CLI agent scores 71.2% on SWE-bench Verified
12 days ago
- #Software Engineering
- #AI Coding Agents
- #Benchmark Performance
- Qodo Command achieved 71.2% on SWE-bench Verified, a benchmark for AI agents in real-world software engineering tasks.
- The CLI agent excels in code review, test writing, bug fixing, and feature generation with context-aware code.
- SWE-bench tests agents in complex, real-world scenarios using real GitHub issues from 12 Python repositories.
- Qodo Command scored 71.2% in a single run without benchmark-specific adjustments, available via npm install.
- Claude 4 is the preferred LLM for Qodo Command, supported by a partnership with Anthropic.
- Key architectural features include context summarization, execution planning, retry mechanisms, and LangGraph.
- Agent tools include FileSystem, Shell Tool, Ripgrep, Sequential Thinking, and Web Search (disabled for SWE-bench).
- Qodo Command focuses on automation with integrity, offering code review, test generation, and documentation automation.
- Includes UI mode for reviewing code with Qodo Merge, ensuring quality and correctness in AI-assisted tasks.
- Available for production use via npm install, aiming to enhance code integrity workflows.