Hasty Briefsbeta

Bilingual

DeepSeek-V4 on Day 0: From Fast Inference to Verified RL with SGLang and Miles

8 hours ago
  • #Miles
  • #DeepSeek-V4
  • #SGLang
  • SGLang and Miles provide Day-0 support for DeepSeek-V4 inference and RL training, optimized for its hybrid architecture.
  • Key inference features include ShadowRadix prefix caching, HiSparse CPU-extended KV, in-graph speculative decoding, and fast kernel integrations like FlashMLA and FlashInfer.
  • Optimizations like Flash Compressor and Lightning TopK reduce HBM round-trips and latency for sparse attention.
  • Parallelism strategies (DP, TP, SP, EP, PP, CP) and hierarchical multi-stream overlap enhance throughput and scalability.
  • RL training with Miles supports full parallelism, FP8 training, and stability features like R3 and indexer replay.
  • Benchmarks show SGLang maintains near-flat decode throughput from 4K to 900K context on B200 and H200 GPUs.
  • Future work is tracked in SGLang and Miles repositories, with acknowledgments to collaborators and contributors.