ButterflyQuant: Ultra-low-bit LLM Quantization
19 hours ago
- #large language models
- #machine learning
- #quantization
- ButterflyQuant introduces learnable orthogonal butterfly transforms for ultra-low-bit LLM quantization.
- Addresses the issue of catastrophic performance loss in 2-bit quantization due to activation outliers.
- Replaces fixed Hadamard transforms with continuous, learnable butterfly transforms for layer-adaptive rotations.
- Ensures orthogonality by construction, providing theoretical guarantees in outlier suppression.
- Achieves O(n log n) computational complexity with only (n log n)/2 learnable parameters.
- Introduces uniformity regularization to promote smoother activation distributions for better quantization.
- Requires minimal calibration (128 samples) and converges quickly on a single GPU.
- Demonstrates superior performance with 15.4 perplexity on LLaMA-2-7B compared to QuaRot's 22.1.