Transformers Are Bayesian Networks
11 hours ago
- #Transformers
- #Bayesian Networks
- #Artificial Intelligence
- Transformers are Bayesian networks, providing a precise explanation for their functionality.
- Every sigmoid transformer implements weighted loopy belief propagation on its implicit factor graph, verified formally.
- Transformers can implement exact belief propagation on any declared knowledge base, with provably correct probability estimates for non-circular dependencies.
- Uniqueness is proven: sigmoid transformers producing exact posteriors must have BP weights, with no alternative paths.
- The transformer layer's structure is delineated: attention as AND, FFN as OR, mirroring Pearl's gather/update algorithm.
- Experimental results corroborate the Bayesian network characterization, showing practical viability of loopy belief propagation.
- Verifiable inference requires a finite concept space; without grounding, correctness is undefined, making hallucination a structural consequence.