Hasty Briefsbeta

Bilingual

Unsloth Dynamic 2.0 GGUFs

4 hours ago
  • #benchmarking
  • #machine-learning
  • #quantization
  • Introduction of Unsloth Dynamic v2.0 quantization method, outperforming leading methods and setting new benchmarks for 5-shot MMLU and KL Divergence.
  • Dynamic v2.0 allows running and fine-tuning quantized LLMs with preserved accuracy, compatible with inference engines like llama.cpp, Ollama, etc.
  • Updates include benchmarks for Qwen3.5 and Aider Polyglot results, showcasing Unsloth's Dynamic 3-bit DeepSeek V3.1 GGUF scoring 75.6%.
  • Unsloth's collaboration with major model teams (Qwen3, Meta, Mistral, Google, Microsoft) to fix critical bugs and boost accuracy.
  • Dynamic v2.0 features revamped layer selection, model-specific quants, and new formats (Q4_NL, Q5.1, etc.) for efficiency on Apple Silicon and ARM devices.
  • KL Divergence highlighted as a gold standard for quantization errors, with a focus on reducing mean KL Divergence while minimizing disk space increase.
  • Calibration dataset overfitting addressed by using Calibration_v3 and v5 datasets for fair testing, avoiding Wikipedia-related overfitting.
  • Challenges in replicating MMLU 5-shot results due to subtle implementation issues, leading to the creation of a custom MMLU implementation.
  • Benchmarks of Gemma 3 QAT versions show impressive results, with Dynamic 4-bit versions offering smaller size and higher accuracy.
  • Bug fixes for Llama 4, including RoPE Scaling configuration and QK Norm issues, improving MMLU Pro accuracy from 68.58% to 71.53%.