Attention is a smoothed cubic spline

9 days ago

Copy Link

The attention module in a transformer is identified as a smoothed cubic spline.
With ReLU-activation, various forms of attention (masked, encoder-decoder) are shown to be cubic splines.
All components of a transformer (encoder, decoder, etc.) are constructed from compositions of attention modules and feed forward neural networks, making them cubic or higher-order splines.
Assuming the Pierce-Birkhoff conjecture, every spline is a ReLU-activated encoder.
Replacing ReLU with a smooth activation like SoftMax yields the original transformer, providing a $C^\infty$-smooth version.
This insight frames the transformer entirely in terms of splines, well-understood objects in applied mathematics.

Hasty Briefsbeta