Agentic Harness Engineering
6 hours ago
- #coding-agents
- #automation
- #observability
- Agentic Harness Engineering (AHE) automates the evolution of coding-agent harnesses by using observability-driven methods.
- It addresses challenges like heterogeneous action spaces, voluminous trajectories, and attribution difficulties through three pillars: component, experience, and decision observability.
- AHE improves pass@1 on Terminal-Bench 2 from 69.7% to 77.0%, outperforming human-designed and self-evolving baselines like Codex-CLI, ACE, and TF-GRPO.
- The evolved harness transfers effectively to other benchmarks like SWE-bench-verified and Terminal-Bench 2, showing cross-model-family gains and reduced token usage.
- Ablations indicate improvements stem from tools, middleware, and long-term memory, not system prompts, suggesting factual harness structures are generalizable.