Backslash: Rate Constrained Optimized Training of Large Language Models
a year ago
- #Model Compression
- #Machine Learning
- #Large Language Models
- Introduces Rate-Constrained Training (BackSlash), a novel training-time compression approach for large language models (LLMs).
- Based on rate-distortion optimization (RDO), enabling flexible trade-off between model accuracy and complexity.
- Reduces memory usage by 60%-90% without accuracy loss, outperforming post-training compression methods.
- Enhances generalization with small Lagrange multipliers and improves model robustness to pruning (up to 80% pruning rates).
- Facilitates network simplification for accelerated inference on edge devices.