Theoretical Analysis of Positional Encodings in Transformer Models

10 months ago

Positional encodings are essential in transformer models for processing sequential data without recurrence.
The paper introduces a theoretical framework to analyze different positional encoding methods (sinusoidal, learned, relative, ALiBi).
Expressiveness is defined via function approximation, and generalization bounds are established using Rademacher complexity.
New encoding methods based on orthogonal functions (wavelets, Legendre polynomials) are proposed.
Orthogonal transform-based encodings outperform traditional sinusoidal encodings in generalization and extrapolation.
The work provides insights for transformer design in NLP, computer vision, and other applications.

Hasty Briefsbeta