DeepCoder: An Open-Source 14B Coder at O3-Mini Level
a year ago
- #Reinforcement Learning
- #AI
- #Open Source
- DeepCoder-14B-Preview is a 14B parameter code reasoning model developed by Agentica and Together AI, achieving 60.6% Pass@1 accuracy on LiveCodeBench.
- The model is trained using reinforcement learning on 24K high-quality, verifiable coding problems over 2.5 weeks on 32 H100 GPUs.
- Dataset curation involved rigorous filtering of coding problems from TACO Verified, PrimeIntellect’s SYNTHETIC-1, and LiveCodeBench to ensure quality.
- A sparse Outcome Reward Model (ORM) is used to avoid reward hacking, requiring generated code to pass all sampled unit tests.
- Training optimizations include GRPO+ (a stable version of GRPO) and iterative context lengthening to improve reasoning over long contexts.
- DeepCoder-14B-Preview demonstrates strong performance on coding benchmarks like LiveCodeBench and Codeforces, matching OpenAI’s o3-mini.
- System optimizations like verl-pipeline reduce training time by 2.5x, enabling faster RL training for long-context models.
- The project is fully open-source, including datasets, code, and training logs, to democratize RL training for LLMs.