Hasty Briefsbeta

Bilingual

GitHub - THUDM/slime: slime is an LLM post-training framework for RL Scaling.

2 months ago
  • #RL-framework
  • #LLM-training
  • #post-training
  • slime is an LLM post-training framework for RL scaling, offering high-performance training and flexible data generation.
  • It supports models like GLM-4 series, Qwen3 series, DeepSeek V3 series, and Llama 3.
  • Core modules include training (Megatron), rollout (SGLang + router), and data buffer.
  • slime powers projects like P1 (physics reasoning), RLVE (verifiable environments), TritonForge (GPU kernels), APRIL (rollout optimization), and qqr (agent evolution).
  • Arguments are categorized into Megatron, SGLang, and slime-specific, with detailed usage documentation available.
  • Contributions are welcome, with guidelines for code style consistency and debugging provided.
  • Special thanks to projects like SGLang, Megatron-LM, and others, with citation instructions included.