GitHub - THUDM/slime: slime is an LLM post-training framework for RL Scaling.

2 months ago

slime is an LLM post-training framework for RL scaling, offering high-performance training and flexible data generation.
It supports models like GLM-4 series, Qwen3 series, DeepSeek V3 series, and Llama 3.
Core modules include training (Megatron), rollout (SGLang + router), and data buffer.
slime powers projects like P1 (physics reasoning), RLVE (verifiable environments), TritonForge (GPU kernels), APRIL (rollout optimization), and qqr (agent evolution).
Arguments are categorized into Megatron, SGLang, and slime-specific, with detailed usage documentation available.
Contributions are welcome, with guidelines for code style consistency and debugging provided.
Special thanks to projects like SGLang, Megatron-LM, and others, with citation instructions included.

Hasty Briefsbeta