Hasty Briefsbeta

Qodo CLI agent scores 71.2% on SWE-bench Verified

12 days ago
  • #Software Engineering
  • #AI Coding Agents
  • #Benchmark Performance
  • Qodo Command achieved 71.2% on SWE-bench Verified, a benchmark for AI agents in real-world software engineering tasks.
  • The CLI agent excels in code review, test writing, bug fixing, and feature generation with context-aware code.
  • SWE-bench tests agents in complex, real-world scenarios using real GitHub issues from 12 Python repositories.
  • Qodo Command scored 71.2% in a single run without benchmark-specific adjustments, available via npm install.
  • Claude 4 is the preferred LLM for Qodo Command, supported by a partnership with Anthropic.
  • Key architectural features include context summarization, execution planning, retry mechanisms, and LangGraph.
  • Agent tools include FileSystem, Shell Tool, Ripgrep, Sequential Thinking, and Web Search (disabled for SWE-bench).
  • Qodo Command focuses on automation with integrity, offering code review, test generation, and documentation automation.
  • Includes UI mode for reviewing code with Qodo Merge, ensuring quality and correctness in AI-assisted tasks.
  • Available for production use via npm install, aiming to enhance code integrity workflows.