Transformers Are Graph Neural Networks

a year ago

Transformers can be viewed as message passing Graph Neural Networks (GNNs) operating on fully connected graphs of tokens.
Self-attention mechanisms in Transformers capture the relative importance of tokens, while positional encodings provide hints about sequential ordering or structure.
Transformers are expressive set processing networks that learn relationships among input elements without being constrained by apriori graphs.
Despite their mathematical connection to GNNs, Transformers are implemented via dense matrix operations, making them more efficient on modern hardware than sparse message passing.
The perspective is presented that Transformers are GNNs currently benefiting from the 'hardware lottery' due to their efficient implementation.

Hasty Briefsbeta