Show HN: Quant Picker – which GGUF file fits your model and machine

7 hours ago

GGUF models offer multiple quantization levels, trading off precision, file size, and quality.
Higher bit quantization (e.g., Q6/Q5) is near-lossless, while lower (e.g., below Q3) leads to quality drop.
The tool calculates file size per quant and remaining memory for context budget, recommending the highest quant with ≥8k context.
Q4_K_M is considered the sweet spot; if forced to lower quants, a smaller model might be better.
File sizes are estimated, not exact; KV-cache assumes typical GQA architecture, with context limits varying by model.
Additional tools include hardware compatibility checker, cost calculator for buying vs. renting vs. API.

Hasty Briefsbeta