On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs
4 months ago
- #Large Language Models
- #Hallucinations
- #Artificial Intelligence
- The paper investigates hallucination-associated neurons (H-Neurons) in Large Language Models (LLMs).
- H-Neurons are a sparse subset (less than 0.1% of total neurons) that predict hallucination occurrences.
- These neurons are causally linked to over-compliance behaviors in LLMs.
- H-Neurons originate during pre-training and remain predictive for hallucination detection.
- The study bridges macroscopic behavioral patterns with microscopic neural mechanisms in LLMs.