Theoretical Analysis of Positional Encodings in Transformer Models
10 months ago
- #transformer-models
- #positional-encodings
- #machine-learning
- Positional encodings are essential in transformer models for processing sequential data without recurrence.
- The paper introduces a theoretical framework to analyze different positional encoding methods (sinusoidal, learned, relative, ALiBi).
- Expressiveness is defined via function approximation, and generalization bounds are established using Rademacher complexity.
- New encoding methods based on orthogonal functions (wavelets, Legendre polynomials) are proposed.
- Orthogonal transform-based encodings outperform traditional sinusoidal encodings in generalization and extrapolation.
- The work provides insights for transformer design in NLP, computer vision, and other applications.