Hasty Briefsbeta

ButterflyQuant: Ultra-low-bit LLM Quantization

19 hours ago
  • #large language models
  • #machine learning
  • #quantization
  • ButterflyQuant introduces learnable orthogonal butterfly transforms for ultra-low-bit LLM quantization.
  • Addresses the issue of catastrophic performance loss in 2-bit quantization due to activation outliers.
  • Replaces fixed Hadamard transforms with continuous, learnable butterfly transforms for layer-adaptive rotations.
  • Ensures orthogonality by construction, providing theoretical guarantees in outlier suppression.
  • Achieves O(n log n) computational complexity with only (n log n)/2 learnable parameters.
  • Introduces uniformity regularization to promote smoother activation distributions for better quantization.
  • Requires minimal calibration (128 samples) and converges quickly on a single GPU.
  • Demonstrates superior performance with 15.4 perplexity on LLaMA-2-7B compared to QuaRot's 22.1.