Hasty Briefsbeta

Bilingual

Intellect-2 Release: The First 32B Model Trained Through Globally Distributed RL

a year ago
  • #AI
  • #Decentralized Training
  • #Reinforcement Learning
  • INTELLECT-2 is the first 32B parameter model trained via globally distributed reinforcement learning.
  • The model uses PRIME-RL, a training framework for distributed asynchronous reinforcement learning, with components like TOPLOC and SHARDCAST.
  • Training data includes 285k verifiable tasks from NuminaMath-1.5, Deepscaler, and SYNTHETIC-1.
  • The model improves upon QwQ-32B with modifications to the GRPO training recipe and advanced data filtering techniques.
  • Future work includes increasing inference-to-training compute ratio, tool calls, crowdsourcing RL tasks, and model merging.
  • The article also includes a detailed math problem solution involving quadratic polynomials P(x) and Q(x).