Attention at Constant Cost per Token via Symmetry-Aware Taylor Approximation
7 days ago
- #Transformers
- #Machine Learning
- #Self-Attention
- Transformers using self-attention are widely used in AI models.
- Standard self-attention has increasing costs with context length, straining resources.
- A new method enables self-attention computation with constant cost per token.
- This method uses symmetry-aware Taylor approximation for efficiency.
- It reduces memory and computation needs significantly.
- The approach allows fixed cost inversely proportional to head size.
- Enables unbounded token generation at modest fixed cost.
- Empirical validation confirms the method's correctness.
- Techniques introduced have independent mathematical interest.