DeepCoder: An Open-Source 14B Coder at O3-Mini Level

a year ago

DeepCoder-14B-Preview is a 14B parameter code reasoning model developed by Agentica and Together AI, achieving 60.6% Pass@1 accuracy on LiveCodeBench.
The model is trained using reinforcement learning on 24K high-quality, verifiable coding problems over 2.5 weeks on 32 H100 GPUs.
Dataset curation involved rigorous filtering of coding problems from TACO Verified, PrimeIntellect’s SYNTHETIC-1, and LiveCodeBench to ensure quality.
A sparse Outcome Reward Model (ORM) is used to avoid reward hacking, requiring generated code to pass all sampled unit tests.
Training optimizations include GRPO+ (a stable version of GRPO) and iterative context lengthening to improve reasoning over long contexts.
DeepCoder-14B-Preview demonstrates strong performance on coding benchmarks like LiveCodeBench and Codeforces, matching OpenAI’s o3-mini.
System optimizations like verl-pipeline reduce training time by 2.5x, enabling faster RL training for long-context models.
The project is fully open-source, including datasets, code, and training logs, to democratize RL training for LLMs.

Hasty Briefsbeta