Hasty Briefsbeta

Bilingual

InferenceX v2: Nvidia Blackwell vs AMD vs. Hopper – SemiAnalysis

4 hours ago
  • #NVIDIA-vs-AMD
  • #AI-inference
  • #GPU-benchmarking
  • InferenceX v2 (formerly InferenceMAX) is an open-source, continuously updated inference benchmark for AI performance and economics.
  • It benchmarks NVIDIA Blackwell, AMD, and Hopper GPUs, including models like GB300 NVL72, MI355X, B200, and H100.
  • The benchmark covers large-scale DeepSeek MoE disaggregated inference with wide expert parallelism (wideEP) and large mixture of experts (MoE).
  • InferenceX v2 utilizes nearly 1,000 frontier GPUs for comprehensive benchmarking across all SKUs.
  • NVIDIA's Blackwell Ultra GB300 NVL72 and B300 are benchmarked across the entire Pareto frontier curve.
  • AMD's MI355X shows competitive performance in FP8 disaggregated prefill but lags in FP4 due to composability issues.
  • NVIDIA's GB300 NVL72 achieves up to 100x better performance on FP8 vs FP4 compared to H100.
  • AMD's SGLang delivers better performance per TCO than NVIDIA's SGLang for FP8 in single-node aggregated serving.
  • NVIDIA dominates in energy efficiency, with lower picoJoules of energy per token across all workloads.
  • AMD's biggest challenge is composability of inference optimizations like disagg prefill, wideEP, and FP4.
  • NVIDIA's TensorRT LLM and NVL72 architecture significantly boost performance, especially in high-throughput scenarios.
  • Future updates will include DeepSeek V4, TPUv7 Ironwood, and Trainium3 benchmarks.
  • The benchmark is open-source under Apache 2.0 and supported by major players like OpenAI, Microsoft, and Google DeepMind.