Hasty Briefsbeta

Bilingual

Theoretical Analysis of Positional Encodings in Transformer Models

10 months ago
  • #transformer-models
  • #positional-encodings
  • #machine-learning
  • Positional encodings are essential in transformer models for processing sequential data without recurrence.
  • The paper introduces a theoretical framework to analyze different positional encoding methods (sinusoidal, learned, relative, ALiBi).
  • Expressiveness is defined via function approximation, and generalization bounds are established using Rademacher complexity.
  • New encoding methods based on orthogonal functions (wavelets, Legendre polynomials) are proposed.
  • Orthogonal transform-based encodings outperform traditional sinusoidal encodings in generalization and extrapolation.
  • The work provides insights for transformer design in NLP, computer vision, and other applications.