Hasty Briefsbeta

Bilingual

Agentic Harness Engineering

6 hours ago
  • #coding-agents
  • #automation
  • #observability
  • Agentic Harness Engineering (AHE) automates the evolution of coding-agent harnesses by using observability-driven methods.
  • It addresses challenges like heterogeneous action spaces, voluminous trajectories, and attribution difficulties through three pillars: component, experience, and decision observability.
  • AHE improves pass@1 on Terminal-Bench 2 from 69.7% to 77.0%, outperforming human-designed and self-evolving baselines like Codex-CLI, ACE, and TF-GRPO.
  • The evolved harness transfers effectively to other benchmarks like SWE-bench-verified and Terminal-Bench 2, showing cross-model-family gains and reduced token usage.
  • Ablations indicate improvements stem from tools, middleware, and long-term memory, not system prompts, suggesting factual harness structures are generalizable.