Ornith-1.0: Self-scaffolding LLMs for agentic coding
4 days ago
- #AI models
- #coding benchmarks
- #self-improving training
- Introduction of Ornith-1.0, a self-improving open-source model family for agentic coding tasks.
- Model variants include 9B Dense, 31B Dense, 35B MoE, and 397B MoE, built on pretrained Gemma 4 and Qwen 3.5.
- Key innovation: self-improving training framework where the model learns to generate both solution rollouts and task-specific harnesses.
- State-of-the-art performance: Ornith-1.0-397B matches or outperforms models like Claude Opus 4.7, Minimax M3, and DeepSeek-V4-Pro on benchmarks.
- Ornith-1.0-9B delivers strong results for edge deployment, exceeding larger models like Gemma 4-31B.
- Addressing reward hacking through fixed trust boundaries, deterministic monitoring, and frozen LLM judges.
- Asynchronous RL training with a pipeline-RL strategy and token-level GRPO loss.
- Detailed benchmark performance tables showing superiority across multiple coding and agentic benchmarks.