Hasty Briefsbeta

Bilingual

Self-Harness: Harnesses That Improve Themselves

8 hours ago
  • #Autonomous Improvement
  • #LLM-based Agents
  • #Self-Harness
  • Self-Harness is a new paradigm where an LLM-based agent improves its own operating harness autonomously, without human engineers or stronger external agents.
  • The approach involves an iterative loop with three stages: Weakness Mining to identify model-specific failure patterns, Harness Proposal to generate minimal harness modifications, and Proposal Validation through regression testing.
  • Experiments on Terminal-Bench-2.0 with models like MiniMax M2.5, Qwen3.5-35B-A3B, and GLM-5 show significant performance improvements, with held-out pass rates increasing from 40.5% to 61.9%, 23.8% to 38.1%, and 42.9% to 57.1% respectively.
  • Qualitative analysis indicates Self-Harness creates concrete, executable harness changes tailored to model-specific weaknesses, rather than adding generic instructions.