Hasty Briefsbeta

Bilingual

Show HN: Zenith: sota harness for normal models to beat Fable on FrontierSWE

18 hours ago
  • #AI Agents
  • #Benchmark Performance
  • #Software Engineering
  • Zenith is an agent harness that builds custom harnesses for long-running engineering tasks, driving them to completion through planning, testing, and improvement.
  • On Frontier SWE, Zenith improved GPT-5.5 from 5th to 1st place by optimizing the harness around the model, rather than using a larger model, demonstrating harness superiority over model size for frontier performance.
  • Access to top models like Claude Fable 5 and GPT-5.6 is restricted due to export controls and limited previews, making system improvements around accessible models crucial for performance.
  • Zenith employs adaptive self-improvement, adjusting workers, testing, skills, and strategies during runs without rewriting its own code, enabling legible and efficient task execution.
  • Meta-Zenith automates harness construction by learning from task feedback, generating tailored orchestrator-worker systems with prompts, milestones, validators, and policies for new tasks.