Unsloth Dynamic 2.0 GGUFs

9 months ago

Unsloth introduces Dynamic v2.0 quantization, a major upgrade improving accuracy and efficiency for quantized LLMs.
Dynamic v2.0 features revamped layer selection, model-specific quants, and new formats like Q4_NL and Q5.1 for better performance on Apple Silicon and ARM devices.
The method includes a high-quality calibration dataset with over 1.5M tokens to enhance conversational chat performance.
Unsloth collaborates with major AI teams (Meta, Google, Microsoft) to fix critical bugs, boosting model accuracy.
Dynamic v2.0 now works on all models, including MoEs and non-MoEs, unlike its predecessor which was limited to MoE architectures.
Benchmarking includes a new efficiency metric considering disk size and MMLU scores, with Dynamic 2.0 showing superior performance.
KL Divergence is used as a key metric to measure quantization accuracy, aiming to minimize flips in model answers.
Unsloth addresses calibration dataset overfitting by using diverse datasets beyond Wikipedia for fair testing.
Replicating MMLU 5-shot results was challenging due to subtle implementation issues, leading to custom benchmarking solutions.
Gemma 3 QAT benchmarks show impressive results, with Dynamic 2.0 versions offering smaller sizes and higher accuracy.
Bug fixes for Llama 4 include RoPE scaling adjustments and QK Norm epsilon corrections, significantly improving MMLU scores.
Instructions provided for running Llama 4 Scout using llama.cpp, showcasing Unsloth's Dynamic 2.0 quantizations.

Hasty Briefsbeta