RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8
9 hours ago
- #local-llm
- #dual-gpu-setup
- #llama.cpp
- Combined RTX 5080 and RTX 3090 in a dual-GPU setup for local LLM inference using an Asus Prime X570-Pro motherboard with PCIe bifurcation.
- Configured BIOS settings, including disabling CSM, enabling Above 4G Decoding and ReSize BAR Support, and set PCIe links to Gen 4 for proper GPU recognition.
- Used llama.cpp build with CUDA flags for Ampere and Blackwell architectures, and startup options for multi-GPU tensor splitting to achieve over 80 tokens/second on Qwen 3.6 27B Q8 model.