Hasty Briefsbeta

Bilingual

RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8

9 hours ago
  • #local-llm
  • #dual-gpu-setup
  • #llama.cpp
  • Combined RTX 5080 and RTX 3090 in a dual-GPU setup for local LLM inference using an Asus Prime X570-Pro motherboard with PCIe bifurcation.
  • Configured BIOS settings, including disabling CSM, enabling Above 4G Decoding and ReSize BAR Support, and set PCIe links to Gen 4 for proper GPU recognition.
  • Used llama.cpp build with CUDA flags for Ampere and Blackwell architectures, and startup options for multi-GPU tensor splitting to achieve over 80 tokens/second on Qwen 3.6 27B Q8 model.