RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8

9 hours ago

Combined RTX 5080 and RTX 3090 in a dual-GPU setup for local LLM inference using an Asus Prime X570-Pro motherboard with PCIe bifurcation.
Configured BIOS settings, including disabling CSM, enabling Above 4G Decoding and ReSize BAR Support, and set PCIe links to Gen 4 for proper GPU recognition.
Used llama.cpp build with CUDA flags for Ampere and Blackwell architectures, and startup options for multi-GPU tensor splitting to achieve over 80 tokens/second on Qwen 3.6 27B Q8 model.

Hasty Briefsbeta