Transformers Are Graph Neural Networks
10 months ago
- #Transformers
- #Graph Neural Networks
- #Machine Learning
- Transformers can be viewed as message passing Graph Neural Networks (GNNs) operating on fully connected graphs of tokens.
- Self-attention mechanisms in Transformers capture the relative importance of tokens, while positional encodings provide hints about sequential ordering or structure.
- Transformers are expressive set processing networks that learn relationships among input elements without being constrained by apriori graphs.
- Despite their mathematical connection to GNNs, Transformers are implemented via dense matrix operations, making them more efficient on modern hardware than sparse message passing.
- The perspective is presented that Transformers are GNNs currently benefiting from the 'hardware lottery' due to their efficient implementation.