Unsloth Dynamic 2.0 GGUFs
9 months ago
- #AI-models
- #machine-learning
- #quantization
- Unsloth introduces Dynamic v2.0 quantization, a major upgrade improving accuracy and efficiency for quantized LLMs.
- Dynamic v2.0 features revamped layer selection, model-specific quants, and new formats like Q4_NL and Q5.1 for better performance on Apple Silicon and ARM devices.
- The method includes a high-quality calibration dataset with over 1.5M tokens to enhance conversational chat performance.
- Unsloth collaborates with major AI teams (Meta, Google, Microsoft) to fix critical bugs, boosting model accuracy.
- Dynamic v2.0 now works on all models, including MoEs and non-MoEs, unlike its predecessor which was limited to MoE architectures.
- Benchmarking includes a new efficiency metric considering disk size and MMLU scores, with Dynamic 2.0 showing superior performance.
- KL Divergence is used as a key metric to measure quantization accuracy, aiming to minimize flips in model answers.
- Unsloth addresses calibration dataset overfitting by using diverse datasets beyond Wikipedia for fair testing.
- Replicating MMLU 5-shot results was challenging due to subtle implementation issues, leading to custom benchmarking solutions.
- Gemma 3 QAT benchmarks show impressive results, with Dynamic 2.0 versions offering smaller sizes and higher accuracy.
- Bug fixes for Llama 4 include RoPE scaling adjustments and QK Norm epsilon corrections, significantly improving MMLU scores.
- Instructions provided for running Llama 4 Scout using llama.cpp, showcasing Unsloth's Dynamic 2.0 quantizations.