Hasty Briefsbeta

Bilingual

Transformers Are Bayesian Networks

9 hours ago
  • #Transformers
  • #Bayesian Networks
  • #Artificial Intelligence
  • Transformers are Bayesian networks, providing a precise explanation for their functionality.
  • Every sigmoid transformer implements weighted loopy belief propagation on its implicit factor graph, verified formally.
  • Transformers can implement exact belief propagation on any declared knowledge base, with provably correct probability estimates for non-circular dependencies.
  • Uniqueness is proven: sigmoid transformers producing exact posteriors must have BP weights, with no alternative paths.
  • The transformer layer's structure is delineated: attention as AND, FFN as OR, mirroring Pearl's gather/update algorithm.
  • Experimental results corroborate the Bayesian network characterization, showing practical viability of loopy belief propagation.
  • Verifiable inference requires a finite concept space; without grounding, correctness is undefined, making hallucination a structural consequence.