Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding

8 hours ago

Ornith-1.0 is a family of open-source models for agentic coding tasks, ranging from 9B Dense to 397B MoE.
It features a self-improving training framework that jointly learns to generate solutions and the scaffolds guiding them.
Ornith-1.0-397B achieves state-of-the-art performance, matching or surpassing models like Claude Opus 4.7 on benchmarks like Terminal-Bench 2.1 and SWE-Bench Verified.
Smaller models like Ornith-1.0-9B and Ornith-1.0-35B outperform larger counterparts such as Gemma and Qwen in their size categories.
The training framework includes defenses against reward hacking through fixed boundaries, a deterministic monitor, and an LLM judge.
Asynchronous RL training uses a pipeline-RL strategy with staleness weighting to manage long rollouts.
Performance metrics across various benchmarks demonstrate Ornith-1.0's superiority over other open-source models of comparable sizes.

Hasty Briefsbeta