InferenceX v2: Nvidia Blackwell vs AMD vs. Hopper – SemiAnalysis
4 hours ago
- #NVIDIA-vs-AMD
- #AI-inference
- #GPU-benchmarking
- InferenceX v2 (formerly InferenceMAX) is an open-source, continuously updated inference benchmark for AI performance and economics.
- It benchmarks NVIDIA Blackwell, AMD, and Hopper GPUs, including models like GB300 NVL72, MI355X, B200, and H100.
- The benchmark covers large-scale DeepSeek MoE disaggregated inference with wide expert parallelism (wideEP) and large mixture of experts (MoE).
- InferenceX v2 utilizes nearly 1,000 frontier GPUs for comprehensive benchmarking across all SKUs.
- NVIDIA's Blackwell Ultra GB300 NVL72 and B300 are benchmarked across the entire Pareto frontier curve.
- AMD's MI355X shows competitive performance in FP8 disaggregated prefill but lags in FP4 due to composability issues.
- NVIDIA's GB300 NVL72 achieves up to 100x better performance on FP8 vs FP4 compared to H100.
- AMD's SGLang delivers better performance per TCO than NVIDIA's SGLang for FP8 in single-node aggregated serving.
- NVIDIA dominates in energy efficiency, with lower picoJoules of energy per token across all workloads.
- AMD's biggest challenge is composability of inference optimizations like disagg prefill, wideEP, and FP4.
- NVIDIA's TensorRT LLM and NVL72 architecture significantly boost performance, especially in high-throughput scenarios.
- Future updates will include DeepSeek V4, TPUv7 Ironwood, and Trainium3 benchmarks.
- The benchmark is open-source under Apache 2.0 and supported by major players like OpenAI, Microsoft, and Google DeepMind.