Hasty Briefsbeta

Explorations of RDMA in LLM Systems

12 days ago
  • #RDMA
  • #High-Performance Computing
  • #LLM Systems
  • The team built an RDMA communication library based on Unordered Reliable Datagram (URD) semantics, compatible with AWS EFA and NVIDIA ConnectX.
  • Applied the library to KvCache transfer in disaggregated inference, model-parameter updates in RL post-training, and MoE communication.
  • Identified pain points with collective communication, including static participant groups, blocking initialization, unnecessary ordering guarantees, and rigid tensor requirements.
  • Highlighted challenges with RDMA, such as lack of portable libraries, vendor lock-in (NVIDIA ConnectX), and performance discrepancies across NICs.
  • Developed a general RDMA library focusing on reliable, unordered delivery, supporting both two-sided SEND/RECV and one-sided WRITE_IMM operations.
  • Optimized MoE kernel performance, achieving better decode speeds than DeepEP on ConnectX-7 and usable performance on EFA.
  • Shared insights on SRD vs. RC protocols, emphasizing SRD's advantages in programming simplicity despite EFA's lower bandwidth.
  • Open-sourced the library and published findings, including arXiv paper, GitHub repository, and blog posts.
  • Reflected on the team's rapid progress in RDMA and systems optimization, from initial struggles to significant contributions in less than a year.