Intellect-2 Release: The First 32B Model Trained Through Globally Distributed RL
a year ago
- #AI
- #Decentralized Training
- #Reinforcement Learning
- INTELLECT-2 is the first 32B parameter model trained via globally distributed reinforcement learning.
- The model uses PRIME-RL, a training framework for distributed asynchronous reinforcement learning, with components like TOPLOC and SHARDCAST.
- Training data includes 285k verifiable tasks from NuminaMath-1.5, Deepscaler, and SYNTHETIC-1.
- The model improves upon QwQ-32B with modifications to the GRPO training recipe and advanced data filtering techniques.
- Future work includes increasing inference-to-training compute ratio, tool calls, crowdsourcing RL tasks, and model merging.
- The article also includes a detailed math problem solution involving quadratic polynomials P(x) and Q(x).