Hasty Briefsbeta

Bilingual

DeepCoder: An Open-Source 14B Coder at O3-Mini Level

a year ago
  • #Reinforcement Learning
  • #AI
  • #Open Source
  • DeepCoder-14B-Preview is a 14B parameter code reasoning model developed by Agentica and Together AI, achieving 60.6% Pass@1 accuracy on LiveCodeBench.
  • The model is trained using reinforcement learning on 24K high-quality, verifiable coding problems over 2.5 weeks on 32 H100 GPUs.
  • Dataset curation involved rigorous filtering of coding problems from TACO Verified, PrimeIntellect’s SYNTHETIC-1, and LiveCodeBench to ensure quality.
  • A sparse Outcome Reward Model (ORM) is used to avoid reward hacking, requiring generated code to pass all sampled unit tests.
  • Training optimizations include GRPO+ (a stable version of GRPO) and iterative context lengthening to improve reasoning over long contexts.
  • DeepCoder-14B-Preview demonstrates strong performance on coding benchmarks like LiveCodeBench and Codeforces, matching OpenAI’s o3-mini.
  • System optimizations like verl-pipeline reduce training time by 2.5x, enabling faster RL training for long-context models.
  • The project is fully open-source, including datasets, code, and training logs, to democratize RL training for LLMs.