Hasty Briefsbeta

Virtualizing Nvidia HGX B200 GPUs with Open Source

a day ago
  • #Open-Source
  • #GPU Virtualization
  • #NVIDIA B200
  • GPU VMs enabled on NVIDIA’s B200 HGX machines, which are trickier to virtualize than H100s.
  • B200 HGX uses SXM modules and NVLink for high-bandwidth GPU-to-GPU connectivity, making virtualization challenging.
  • Three virtualization models: Full Passthrough Mode, vGPU, and Shared NVSwitch Multitenancy Mode.
  • Shared NVSwitch Multitenancy Mode supports 1-, 2-, 4-, and 8-GPU VMs with full NVLink bandwidth.
  • Host preparation involves binding GPUs to vfio-pci driver and configuring IOMMU support.
  • Matching driver versions between host and VM is critical for Shared NVSwitch Multitenancy.
  • PCI topology mismatch can cause CUDA initialization failures; QEMU can recreate correct hierarchy.
  • Large-BAR stalls during VM boot can be resolved by upgrading QEMU or disabling BAR mmap.
  • Fabric Manager controls GPU partitions and enforces isolation in Shared NVSwitch Multitenancy Mode.
  • Open-source implementation available in Ubicloud, with components for GPU allocation and VM launch.