Hasty Briefsbeta

Bilingual

Log-Linear Attention

a year ago
  • #Transformers
  • #Machine Learning
  • #Attention Mechanism
  • The attention mechanism in Transformers is crucial for sequence modeling but suffers from quadratic-compute and linear-memory complexity.
  • Linear attention and state-space models offer linear-time, constant-memory sequence modeling but are limited by their fixed-size hidden state.
  • Log-linear attention is introduced as a mechanism that balances efficiency and expressiveness by using a logarithmically growing set of hidden states.
  • Log-linear attention can be applied to existing linear attention variants and maintains matmul-rich parallelization with log-linear compute cost.
  • Case studies show log-linear variants of Mamba-2 and Gated DeltaNet perform well compared to their linear-time counterparts.