On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs

5 months ago

The paper investigates hallucination-associated neurons (H-Neurons) in Large Language Models (LLMs).
H-Neurons are a sparse subset (less than 0.1% of total neurons) that predict hallucination occurrences.
These neurons are causally linked to over-compliance behaviors in LLMs.
H-Neurons originate during pre-training and remain predictive for hallucination detection.
The study bridges macroscopic behavioral patterns with microscopic neural mechanisms in LLMs.

Hasty Briefsbeta