Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding
8 hours ago
- #agentic benchmarks
- #self-improving training
- #AI coding models
- Ornith-1.0 is a family of open-source models for agentic coding tasks, ranging from 9B Dense to 397B MoE.
- It features a self-improving training framework that jointly learns to generate solutions and the scaffolds guiding them.
- Ornith-1.0-397B achieves state-of-the-art performance, matching or surpassing models like Claude Opus 4.7 on benchmarks like Terminal-Bench 2.1 and SWE-Bench Verified.
- Smaller models like Ornith-1.0-9B and Ornith-1.0-35B outperform larger counterparts such as Gemma and Qwen in their size categories.
- The training framework includes defenses against reward hacking through fixed boundaries, a deterministic monitor, and an LLM judge.
- Asynchronous RL training uses a pipeline-RL strategy with staleness weighting to manage long rollouts.
- Performance metrics across various benchmarks demonstrate Ornith-1.0's superiority over other open-source models of comparable sizes.