LLM-D: Kubernetes-Native Distributed Inference at Scale
a year ago
- #AI
- #OpenSource
- #Kubernetes
- CoreWeave, Google, IBM Research, NVIDIA, and Red Hat launched the llm-d community.
- llm-d is a Kubernetes-native distributed inference serving stack for large language models.
- Features include vLLM-Optimized Inference Scheduler, Disaggregated Serving with vLLM, Disaggregated Prefix Caching with vLLM, and Variant Autoscaling.
- llm-d adopts a layered architecture on top of vLLM, Kubernetes, and Inference Gateway.
- The project is community-driven, Apache-2 licensed, and has an open development model.
- Installation options include a full solution via Helm chart or individual components.
- Weekly standups, Slack discussions, and Google Groups are used for collaboration.
- Project is licensed under Apache License 2.0.