Self-Harness: Harnesses That Improve Themselves

8 hours ago

Self-Harness is a new paradigm where an LLM-based agent improves its own operating harness autonomously, without human engineers or stronger external agents.
The approach involves an iterative loop with three stages: Weakness Mining to identify model-specific failure patterns, Harness Proposal to generate minimal harness modifications, and Proposal Validation through regression testing.
Experiments on Terminal-Bench-2.0 with models like MiniMax M2.5, Qwen3.5-35B-A3B, and GLM-5 show significant performance improvements, with held-out pass rates increasing from 40.5% to 61.9%, 23.8% to 38.1%, and 42.9% to 57.1% respectively.
Qualitative analysis indicates Self-Harness creates concrete, executable harness changes tailored to model-specific weaknesses, rather than adding generic instructions.

Hasty Briefsbeta