ButterflyQuant: Ultra-low-bit LLM Quantization

19 hours ago

Copy Link

ButterflyQuant introduces learnable orthogonal butterfly transforms for ultra-low-bit LLM quantization.
Addresses the issue of catastrophic performance loss in 2-bit quantization due to activation outliers.
Replaces fixed Hadamard transforms with continuous, learnable butterfly transforms for layer-adaptive rotations.
Ensures orthogonality by construction, providing theoretical guarantees in outlier suppression.
Achieves O(n log n) computational complexity with only (n log n)/2 learnable parameters.
Introduces uniformity regularization to promote smoother activation distributions for better quantization.
Requires minimal calibration (128 samples) and converges quickly on a single GPU.
Demonstrates superior performance with 15.4 perplexity on LLaMA-2-7B compared to QuaRot's 22.1.

Hasty Briefsbeta