Hasty Briefsbeta

Compute Where It Counts: High Quality Sparsely Activated LLMs

3 days ago
  • #LLM efficiency
  • #adaptive computation
  • #sparse transformers
  • CWIC (Compute Where It Counts) is a new method for creating efficient transformers that adaptively allocate compute resources.
  • CWIC achieves a 3x increase in CPU throughput with only a 10% reduction in benchmark performance.
  • The method uses learned activation thresholds and expressive sparsity patterns to enable adaptive computation.
  • CWIC directly optimizes compute as a loss function, learning to budget compute without labeled data or heuristics.
  • Experiments show CWIC outperforms TEAL across all FLOP reduction levels, with a 15-point average improvement at 3x FLOP reduction.
  • CWIC exhibits interpretable compute allocation, using less compute for easier tasks and formatting tokens.
  • The method learns to prune attention heads and compress attention outputs, aligning computational bases with model channels.
  • CWIC's granular sparsity patterns align with SIMD registers for efficient CPU inference.
  • The approach demonstrates emergent behaviors like allocating less compute to easier problems without explicit training.
  • CWIC will be open-sourced, with code and pretrained models available on GitHub and Hugging Face.