Attention at Constant Cost per Token via Symmetry-Aware Taylor Approximation

4 months ago

Transformers using self-attention are widely used in AI models.
Standard self-attention has increasing costs with context length, straining resources.
A new method enables self-attention computation with constant cost per token.
This method uses symmetry-aware Taylor approximation for efficiency.
It reduces memory and computation needs significantly.
The approach allows fixed cost inversely proportional to head size.
Enables unbounded token generation at modest fixed cost.
Empirical validation confirms the method's correctness.
Techniques introduced have independent mathematical interest.

Hasty Briefsbeta