NVIDIA DGX Spark In-Depth Review: A New Standard for Local AI Inference
11 hours ago
- #NVIDIA
- #AI Hardware
- #Inference Performance
- NVIDIA DGX Spark is a compact, all-in-one machine bringing supercomputing-class performance to a desktop workstation.
- Features a full-metal chassis with a sleek champagne-gold finish and metal foam panels for cooling.
- Connectivity includes four USB-C ports (one supporting 240W power delivery), HDMI, 10 GbE, and two QSFP ports (200 Gbps).
- Powered by the NVIDIA GB10 Grace Blackwell Superchip with 20 CPU cores and 1 PFLOP of sparse FP4 tensor performance.
- 128 GB of unified LPDDR5x memory shared between CPU and GPU, enabling large model loading without VRAM transfers.
- Performance benchmarks show strengths in smaller models and batching, with limitations due to memory bandwidth.
- Supports speculative decoding (EAGLE3) for up to 2× speed-up in inference throughput.
- Efficient thermal design with stable performance under load and minimal fan noise.
- Ideal for model prototyping, lightweight on-device inference, and memory-coherent GPU research.
- Pre-installed Docker allows easy model serving via SGLang and Ollama, with OpenAI-compatible API endpoints.