My Participation in the METR AI Productivity Study

10 months ago

METR study found developers using AI took 19% longer to complete tasks (N=246 tasks, 95% CI [-40%, -2%]).
The study involved a randomized controlled trial with developers working on tasks with and without AI assistance.
The author participated in the study, working on the jsdom project, which has over 1 million lines of code.
Tasks included bug fixes, feature implementations, and test coverage improvements, with 9 AI-allowed and 10 no-AI tasks.
AI tools used included Cursor’s agent mode, Claude Code, and Gemini, but faced challenges with codebase consistency and specification implementation.
AI models struggled with existing codebase styles, repetitive tasks, and accurately implementing web specifications.
Despite feeling engaging, AI-assisted tasks were not faster due to frequent missteps and the need for constant oversight.
The author suggests parallel-agents mode as a more promising approach for future AI-assisted productivity.
Large, established codebases pose unique challenges for AI tools compared to greenfield projects.

Hasty Briefsbeta