SparseLoCo: Communication-Efficient LLM Training
9 days ago
- #Machine Learning
- #Communication Efficiency
- #Large Language Models
- SparseLoCo is a communication-efficient training algorithm for Large Language Models (LLMs).
- It leverages Top-k sparsification and quantization to achieve extreme compression ratios (1-3% sparsity, 2-bit quantization).
- Outer momentum can be locally approximated by error feedback combined with aggressive sparsity.
- Sparse aggregation can improve model performance.
- SparseLoCo outperforms full-precision DiLoCo in communication-constrained LLM training settings.
- The method reduces communication frequency and bandwidth requirements, beneficial for cross-datacenter links.