Evaluating Agent-Based Program Repair at Google

a year ago

Agent-based program repair uses LLMs to automatically fix complex bugs by combining planning, tool use, and code generation.
The paper evaluates agent-based repair in an enterprise context using 178 bugs from Google's issue tracking system (78 human-reported, 100 machine-reported).
Passerine, an agent similar to SWE-Agent, achieves a 73% plausible patch rate for machine-reported bugs and 25.6% for human-reported bugs using Gemini 1.5 Pro.
Manual examination shows 43% of machine-reported and 17.9% of human-reported bugs have semantically equivalent patches to the ground-truth.
The study highlights differences in bug distribution (language diversity, size, spread of changes) between Google's dataset and the open-source SWE-Bench.

Hasty Briefsbeta