Ornith-1.0: Self-scaffolding LLMs for agentic coding

4 days ago

Introduction of Ornith-1.0, a self-improving open-source model family for agentic coding tasks.
Model variants include 9B Dense, 31B Dense, 35B MoE, and 397B MoE, built on pretrained Gemma 4 and Qwen 3.5.
Key innovation: self-improving training framework where the model learns to generate both solution rollouts and task-specific harnesses.
State-of-the-art performance: Ornith-1.0-397B matches or outperforms models like Claude Opus 4.7, Minimax M3, and DeepSeek-V4-Pro on benchmarks.
Ornith-1.0-9B delivers strong results for edge deployment, exceeding larger models like Gemma 4-31B.
Addressing reward hacking through fixed trust boundaries, deterministic monitoring, and frozen LLM judges.
Asynchronous RL training with a pipeline-RL strategy and token-level GRPO loss.
Detailed benchmark performance tables showing superiority across multiple coding and agentic benchmarks.

Hasty Briefsbeta