Unsloth Dynamic 2.0 GGUFs

4 hours ago

Introduction of Unsloth Dynamic v2.0 quantization method, outperforming leading methods and setting new benchmarks for 5-shot MMLU and KL Divergence.
Dynamic v2.0 allows running and fine-tuning quantized LLMs with preserved accuracy, compatible with inference engines like llama.cpp, Ollama, etc.
Updates include benchmarks for Qwen3.5 and Aider Polyglot results, showcasing Unsloth's Dynamic 3-bit DeepSeek V3.1 GGUF scoring 75.6%.
Unsloth's collaboration with major model teams (Qwen3, Meta, Mistral, Google, Microsoft) to fix critical bugs and boost accuracy.
Dynamic v2.0 features revamped layer selection, model-specific quants, and new formats (Q4_NL, Q5.1, etc.) for efficiency on Apple Silicon and ARM devices.
KL Divergence highlighted as a gold standard for quantization errors, with a focus on reducing mean KL Divergence while minimizing disk space increase.
Calibration dataset overfitting addressed by using Calibration_v3 and v5 datasets for fair testing, avoiding Wikipedia-related overfitting.
Challenges in replicating MMLU 5-shot results due to subtle implementation issues, leading to the creation of a custom MMLU implementation.
Benchmarks of Gemma 3 QAT versions show impressive results, with Dynamic 4-bit versions offering smaller size and higher accuracy.
Bug fixes for Llama 4, including RoPE Scaling configuration and QK Norm issues, improving MMLU Pro accuracy from 68.58% to 71.53%.

Hasty Briefsbeta