Hasty Briefsbeta

Bilingual

Attention at Constant Cost per Token via Symmetry-Aware Taylor Approximation

7 days ago
  • #Transformers
  • #Machine Learning
  • #Self-Attention
  • Transformers using self-attention are widely used in AI models.
  • Standard self-attention has increasing costs with context length, straining resources.
  • A new method enables self-attention computation with constant cost per token.
  • This method uses symmetry-aware Taylor approximation for efficiency.
  • It reduces memory and computation needs significantly.
  • The approach allows fixed cost inversely proportional to head size.
  • Enables unbounded token generation at modest fixed cost.
  • Empirical validation confirms the method's correctness.
  • Techniques introduced have independent mathematical interest.