Attention is a smoothed cubic spline
9 days ago
- #Transformers
- #Splines
- #Artificial Intelligence
- The attention module in a transformer is identified as a smoothed cubic spline.
- With ReLU-activation, various forms of attention (masked, encoder-decoder) are shown to be cubic splines.
- All components of a transformer (encoder, decoder, etc.) are constructed from compositions of attention modules and feed forward neural networks, making them cubic or higher-order splines.
- Assuming the Pierce-Birkhoff conjecture, every spline is a ReLU-activated encoder.
- Replacing ReLU with a smooth activation like SoftMax yields the original transformer, providing a $C^\infty$-smooth version.
- This insight frames the transformer entirely in terms of splines, well-understood objects in applied mathematics.