Hasty Briefsbeta

Bilingual

Model-Preserving Adaptive Rounding

a year ago
  • #LLMs
  • #Machine Learning
  • #Quantization
  • Introduces YAQA, an adaptive rounding algorithm for post-training quantization (PTQ) of LLMs.
  • Uses Kronecker-factored approximations of each linear layer's Hessian with respect to the full model KL divergence.
  • YAQA consists of two components: Kronecker-factored sketches of the full layerwise Hessian and a quantizer-independent rounding algorithm.
  • Empirically reduces the KL divergence to the original model by ≈30% while achieving state-of-the-art performance on downstream tasks.
  • Applicable to hundred-billion parameter LLMs and works across a wide range of models and quantizers.