Hasty Briefsbeta

Bilingual

SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via CI

19 hours ago
  • #Continuous Integration
  • #Software Engineering
  • #LLM Agents
  • LLM-powered agents show strong capabilities in automating software engineering tasks like static bug fixing.
  • SWE-CI is a new benchmark focusing on dynamic, long-term maintainability of codebases, moving beyond static, short-term functional correctness.
  • The benchmark includes 100 tasks, each representing an average of 233 days and 71 commits in real-world repositories.
  • Agents are required to resolve tasks through multiple rounds of analysis and coding iterations.
  • SWE-CI provides insights into agents' ability to maintain code quality over long-term evolution.