Hasty Briefsbeta

SparseLoCo: Communication-Efficient LLM Training

9 days ago
  • #Machine Learning
  • #Communication Efficiency
  • #Large Language Models
  • SparseLoCo is a communication-efficient training algorithm for Large Language Models (LLMs).
  • It leverages Top-k sparsification and quantization to achieve extreme compression ratios (1-3% sparsity, 2-bit quantization).
  • Outer momentum can be locally approximated by error feedback combined with aggressive sparsity.
  • Sparse aggregation can improve model performance.
  • SparseLoCo outperforms full-precision DiLoCo in communication-constrained LLM training settings.
  • The method reduces communication frequency and bandwidth requirements, beneficial for cross-datacenter links.