Log-Linear Attention
a year ago
- #Transformers
- #Machine Learning
- #Attention Mechanism
- The attention mechanism in Transformers is crucial for sequence modeling but suffers from quadratic-compute and linear-memory complexity.
- Linear attention and state-space models offer linear-time, constant-memory sequence modeling but are limited by their fixed-size hidden state.
- Log-linear attention is introduced as a mechanism that balances efficiency and expressiveness by using a logarithmically growing set of hidden states.
- Log-linear attention can be applied to existing linear attention variants and maintains matmul-rich parallelization with log-linear compute cost.
- Case studies show log-linear variants of Mamba-2 and Gated DeltaNet perform well compared to their linear-time counterparts.