Show HN: Zenith: sota harness for normal models to beat Fable on FrontierSWE
18 hours ago
- #AI Agents
- #Benchmark Performance
- #Software Engineering
- Zenith is an agent harness that builds custom harnesses for long-running engineering tasks, driving them to completion through planning, testing, and improvement.
- On Frontier SWE, Zenith improved GPT-5.5 from 5th to 1st place by optimizing the harness around the model, rather than using a larger model, demonstrating harness superiority over model size for frontier performance.
- Access to top models like Claude Fable 5 and GPT-5.6 is restricted due to export controls and limited previews, making system improvements around accessible models crucial for performance.
- Zenith employs adaptive self-improvement, adjusting workers, testing, skills, and strategies during runs without rewriting its own code, enabling legible and efficient task execution.
- Meta-Zenith automates harness construction by learning from task feedback, generating tailored orchestrator-worker systems with prompts, milestones, validators, and policies for new tasks.