A case study in testing with 100+ Claude agents in parallel
2 days ago
- #testing automation
- #AI-assisted software engineering
- #multi-agent workflows
- mngr is used to run and improve itself by testing its own demo script through a multi-agent workflow.
- The process starts with a tutorial script divided into blocks, each converted into pytest functions and assigned to an agent for execution, debugging, fixing, and improvement.
- Coding agents generate examples from tutorial blocks, helping either produce good examples or identify interface issues to refine mngr.
- Tutorial blocks are transformed into pytest functions with a 1:N correspondence to cover various scenarios, and agents cite tutorial blocks for traceability.
- A test framework built on Python's subprocess module allows concise test functions and generates CLI transcripts and TUI recordings via tools like asciinema.
- Tests are orchestrated by collecting test names, launching agents to fix or improve each test, pulling results, and integrating changes into a single PR.
- Integration involves separating implementation fixes from non-implementation fixes, merging the latter directly and ranking the former for review.
- The workflow was developed locally with 10 agents and scaled to 100 agents on Modal by changing mngr create commands, maintaining consistency across environments.
- mngr enables building custom map-reduce-like pipelines using its primitives, supporting both small-scale local runs and large-scale remote deployments without upfront costs.
- The tool emphasizes scalability in both directions (up and down), allowing teams to start quickly locally and scale as needed, aligning with Imbue's mission to make tech serve humans.