GitHub - THUDM/slime: slime is an LLM post-training framework for RL Scaling.
2 months ago
- #RL-framework
- #LLM-training
- #post-training
- slime is an LLM post-training framework for RL scaling, offering high-performance training and flexible data generation.
- It supports models like GLM-4 series, Qwen3 series, DeepSeek V3 series, and Llama 3.
- Core modules include training (Megatron), rollout (SGLang + router), and data buffer.
- slime powers projects like P1 (physics reasoning), RLVE (verifiable environments), TritonForge (GPU kernels), APRIL (rollout optimization), and qqr (agent evolution).
- Arguments are categorized into Megatron, SGLang, and slime-specific, with detailed usage documentation available.
- Contributions are welcome, with guidelines for code style consistency and debugging provided.
- Special thanks to projects like SGLang, Megatron-LM, and others, with citation instructions included.