Hasty Briefsbeta

Bilingual

LLM-D: Kubernetes-Native Distributed Inference at Scale

a year ago
  • #AI
  • #OpenSource
  • #Kubernetes
  • CoreWeave, Google, IBM Research, NVIDIA, and Red Hat launched the llm-d community.
  • llm-d is a Kubernetes-native distributed inference serving stack for large language models.
  • Features include vLLM-Optimized Inference Scheduler, Disaggregated Serving with vLLM, Disaggregated Prefix Caching with vLLM, and Variant Autoscaling.
  • llm-d adopts a layered architecture on top of vLLM, Kubernetes, and Inference Gateway.
  • The project is community-driven, Apache-2 licensed, and has an open development model.
  • Installation options include a full solution via Helm chart or individual components.
  • Weekly standups, Slack discussions, and Google Groups are used for collaboration.
  • Project is licensed under Apache License 2.0.