Backslash: Rate Constrained Optimized Training of Large Language Models

a year ago

Introduces Rate-Constrained Training (BackSlash), a novel training-time compression approach for large language models (LLMs).
Based on rate-distortion optimization (RDO), enabling flexible trade-off between model accuracy and complexity.
Reduces memory usage by 60%-90% without accuracy loss, outperforming post-training compression methods.
Enhances generalization with small Lagrange multipliers and improves model robustness to pruning (up to 80% pruning rates).
Facilitates network simplification for accelerated inference on edge devices.

Hasty Briefsbeta