Hasty Briefsbeta

Bilingual

Unsloth Dynamic 2.0 GGUFs

9 months ago
  • #AI-models
  • #machine-learning
  • #quantization
  • Unsloth introduces Dynamic v2.0 quantization, a major upgrade improving accuracy and efficiency for quantized LLMs.
  • Dynamic v2.0 features revamped layer selection, model-specific quants, and new formats like Q4_NL and Q5.1 for better performance on Apple Silicon and ARM devices.
  • The method includes a high-quality calibration dataset with over 1.5M tokens to enhance conversational chat performance.
  • Unsloth collaborates with major AI teams (Meta, Google, Microsoft) to fix critical bugs, boosting model accuracy.
  • Dynamic v2.0 now works on all models, including MoEs and non-MoEs, unlike its predecessor which was limited to MoE architectures.
  • Benchmarking includes a new efficiency metric considering disk size and MMLU scores, with Dynamic 2.0 showing superior performance.
  • KL Divergence is used as a key metric to measure quantization accuracy, aiming to minimize flips in model answers.
  • Unsloth addresses calibration dataset overfitting by using diverse datasets beyond Wikipedia for fair testing.
  • Replicating MMLU 5-shot results was challenging due to subtle implementation issues, leading to custom benchmarking solutions.
  • Gemma 3 QAT benchmarks show impressive results, with Dynamic 2.0 versions offering smaller sizes and higher accuracy.
  • Bug fixes for Llama 4 include RoPE scaling adjustments and QK Norm epsilon corrections, significantly improving MMLU scores.
  • Instructions provided for running Llama 4 Scout using llama.cpp, showcasing Unsloth's Dynamic 2.0 quantizations.