Hasty Briefsbeta

Bilingual

Backslash: Rate Constrained Optimized Training of Large Language Models

a year ago
  • #Model Compression
  • #Machine Learning
  • #Large Language Models
  • Introduces Rate-Constrained Training (BackSlash), a novel training-time compression approach for large language models (LLMs).
  • Based on rate-distortion optimization (RDO), enabling flexible trade-off between model accuracy and complexity.
  • Reduces memory usage by 60%-90% without accuracy loss, outperforming post-training compression methods.
  • Enhances generalization with small Lagrange multipliers and improves model robustness to pruning (up to 80% pruning rates).
  • Facilitates network simplification for accelerated inference on edge devices.