Hasty Briefsbeta

Bilingual

Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding

8 hours ago
  • #agentic benchmarks
  • #self-improving training
  • #AI coding models
  • Ornith-1.0 is a family of open-source models for agentic coding tasks, ranging from 9B Dense to 397B MoE.
  • It features a self-improving training framework that jointly learns to generate solutions and the scaffolds guiding them.
  • Ornith-1.0-397B achieves state-of-the-art performance, matching or surpassing models like Claude Opus 4.7 on benchmarks like Terminal-Bench 2.1 and SWE-Bench Verified.
  • Smaller models like Ornith-1.0-9B and Ornith-1.0-35B outperform larger counterparts such as Gemma and Qwen in their size categories.
  • The training framework includes defenses against reward hacking through fixed boundaries, a deterministic monitor, and an LLM judge.
  • Asynchronous RL training uses a pipeline-RL strategy with staleness weighting to manage long rollouts.
  • Performance metrics across various benchmarks demonstrate Ornith-1.0's superiority over other open-source models of comparable sizes.