Transformers Are Bayesian Networks

2 months ago

Transformers are Bayesian networks, providing a precise explanation for their functionality.
Every sigmoid transformer implements weighted loopy belief propagation on its implicit factor graph, verified formally.
Transformers can implement exact belief propagation on any declared knowledge base, with provably correct probability estimates for non-circular dependencies.
Uniqueness is proven: sigmoid transformers producing exact posteriors must have BP weights, with no alternative paths.
The transformer layer's structure is delineated: attention as AND, FFN as OR, mirroring Pearl's gather/update algorithm.
Experimental results corroborate the Bayesian network characterization, showing practical viability of loopy belief propagation.
Verifiable inference requires a finite concept space; without grounding, correctness is undefined, making hallucination a structural consequence.

Hasty Briefsbeta