Hasty Briefsbeta

Bilingual

Ornith-1.0: Self-scaffolding LLMs for agentic coding

4 days ago
  • #AI models
  • #coding benchmarks
  • #self-improving training
  • Introduction of Ornith-1.0, a self-improving open-source model family for agentic coding tasks.
  • Model variants include 9B Dense, 31B Dense, 35B MoE, and 397B MoE, built on pretrained Gemma 4 and Qwen 3.5.
  • Key innovation: self-improving training framework where the model learns to generate both solution rollouts and task-specific harnesses.
  • State-of-the-art performance: Ornith-1.0-397B matches or outperforms models like Claude Opus 4.7, Minimax M3, and DeepSeek-V4-Pro on benchmarks.
  • Ornith-1.0-9B delivers strong results for edge deployment, exceeding larger models like Gemma 4-31B.
  • Addressing reward hacking through fixed trust boundaries, deterministic monitoring, and frozen LLM judges.
  • Asynchronous RL training with a pipeline-RL strategy and token-level GRPO loss.
  • Detailed benchmark performance tables showing superiority across multiple coding and agentic benchmarks.