A case study in testing with 100+ Claude agents in parallel

2 days ago

#testing automation
#AI-assisted software engineering
#multi-agent workflows

mngr is used to run and improve itself by testing its own demo script through a multi-agent workflow.
The process starts with a tutorial script divided into blocks, each converted into pytest functions and assigned to an agent for execution, debugging, fixing, and improvement.
Coding agents generate examples from tutorial blocks, helping either produce good examples or identify interface issues to refine mngr.
Tutorial blocks are transformed into pytest functions with a 1:N correspondence to cover various scenarios, and agents cite tutorial blocks for traceability.
A test framework built on Python's subprocess module allows concise test functions and generates CLI transcripts and TUI recordings via tools like asciinema.
Tests are orchestrated by collecting test names, launching agents to fix or improve each test, pulling results, and integrating changes into a single PR.
Integration involves separating implementation fixes from non-implementation fixes, merging the latter directly and ranking the former for review.
The workflow was developed locally with 10 agents and scaled to 100 agents on Modal by changing mngr create commands, maintaining consistency across environments.
mngr enables building custom map-reduce-like pipelines using its primitives, supporting both small-scale local runs and large-scale remote deployments without upfront costs.
The tool emphasizes scalability in both directions (up and down), allowing teams to start quickly locally and scale as needed, aligning with Imbue's mission to make tech serve humans.

Hasty Briefsbeta

A case study in testing with 100+ Claude agents in parallel