Unsloth Dynamic 2.0 GGUFs
4 hours ago
- #benchmarking
- #machine-learning
- #quantization
- Introduction of Unsloth Dynamic v2.0 quantization method, outperforming leading methods and setting new benchmarks for 5-shot MMLU and KL Divergence.
- Dynamic v2.0 allows running and fine-tuning quantized LLMs with preserved accuracy, compatible with inference engines like llama.cpp, Ollama, etc.
- Updates include benchmarks for Qwen3.5 and Aider Polyglot results, showcasing Unsloth's Dynamic 3-bit DeepSeek V3.1 GGUF scoring 75.6%.
- Unsloth's collaboration with major model teams (Qwen3, Meta, Mistral, Google, Microsoft) to fix critical bugs and boost accuracy.
- Dynamic v2.0 features revamped layer selection, model-specific quants, and new formats (Q4_NL, Q5.1, etc.) for efficiency on Apple Silicon and ARM devices.
- KL Divergence highlighted as a gold standard for quantization errors, with a focus on reducing mean KL Divergence while minimizing disk space increase.
- Calibration dataset overfitting addressed by using Calibration_v3 and v5 datasets for fair testing, avoiding Wikipedia-related overfitting.
- Challenges in replicating MMLU 5-shot results due to subtle implementation issues, leading to the creation of a custom MMLU implementation.
- Benchmarks of Gemma 3 QAT versions show impressive results, with Dynamic 4-bit versions offering smaller size and higher accuracy.
- Bug fixes for Llama 4, including RoPE Scaling configuration and QK Norm issues, improving MMLU Pro accuracy from 68.58% to 71.53%.