InferenceX v2: Nvidia Blackwell vs AMD vs. Hopper – SemiAnalysis

4 hours ago

InferenceX v2 (formerly InferenceMAX) is an open-source, continuously updated inference benchmark for AI performance and economics.
It benchmarks NVIDIA Blackwell, AMD, and Hopper GPUs, including models like GB300 NVL72, MI355X, B200, and H100.
The benchmark covers large-scale DeepSeek MoE disaggregated inference with wide expert parallelism (wideEP) and large mixture of experts (MoE).
InferenceX v2 utilizes nearly 1,000 frontier GPUs for comprehensive benchmarking across all SKUs.
NVIDIA's Blackwell Ultra GB300 NVL72 and B300 are benchmarked across the entire Pareto frontier curve.
AMD's MI355X shows competitive performance in FP8 disaggregated prefill but lags in FP4 due to composability issues.
NVIDIA's GB300 NVL72 achieves up to 100x better performance on FP8 vs FP4 compared to H100.
AMD's SGLang delivers better performance per TCO than NVIDIA's SGLang for FP8 in single-node aggregated serving.
NVIDIA dominates in energy efficiency, with lower picoJoules of energy per token across all workloads.
AMD's biggest challenge is composability of inference optimizations like disagg prefill, wideEP, and FP4.
NVIDIA's TensorRT LLM and NVL72 architecture significantly boost performance, especially in high-throughput scenarios.
Future updates will include DeepSeek V4, TPUv7 Ironwood, and Trainium3 benchmarks.
The benchmark is open-source under Apache 2.0 and supported by major players like OpenAI, Microsoft, and Google DeepMind.

Hasty Briefsbeta