Qodo CLI agent scores 71.2% on SWE-bench Verified

12 days ago

Copy Link

Qodo Command achieved 71.2% on SWE-bench Verified, a benchmark for AI agents in real-world software engineering tasks.
The CLI agent excels in code review, test writing, bug fixing, and feature generation with context-aware code.
SWE-bench tests agents in complex, real-world scenarios using real GitHub issues from 12 Python repositories.
Qodo Command scored 71.2% in a single run without benchmark-specific adjustments, available via npm install.
Claude 4 is the preferred LLM for Qodo Command, supported by a partnership with Anthropic.
Key architectural features include context summarization, execution planning, retry mechanisms, and LangGraph.
Agent tools include FileSystem, Shell Tool, Ripgrep, Sequential Thinking, and Web Search (disabled for SWE-bench).
Qodo Command focuses on automation with integrity, offering code review, test generation, and documentation automation.
Includes UI mode for reviewing code with Qodo Merge, ensuring quality and correctness in AI-assisted tasks.
Available for production use via npm install, aiming to enhance code integrity workflows.

Hasty Briefsbeta