Hasty Briefsbeta

Attention is a smoothed cubic spline

9 days ago
  • #Transformers
  • #Splines
  • #Artificial Intelligence
  • The attention module in a transformer is identified as a smoothed cubic spline.
  • With ReLU-activation, various forms of attention (masked, encoder-decoder) are shown to be cubic splines.
  • All components of a transformer (encoder, decoder, etc.) are constructed from compositions of attention modules and feed forward neural networks, making them cubic or higher-order splines.
  • Assuming the Pierce-Birkhoff conjecture, every spline is a ReLU-activated encoder.
  • Replacing ReLU with a smooth activation like SoftMax yields the original transformer, providing a $C^\infty$-smooth version.
  • This insight frames the transformer entirely in terms of splines, well-understood objects in applied mathematics.