LLMs Can Get "Brain Rot"
a day ago
- #Data Quality
- #LLM
- #Cognitive Decline
- Proposes the LLM Brain Rot Hypothesis: continual exposure to junk web text causes cognitive decline in LLMs.
- Conducted controlled experiments using real Twitter/X corpora with junk and control datasets via two metrics: engagement degree (M1) and semantic quality (M2).
- Found significant declines in reasoning, long-context understanding, safety, and increased 'dark traits' (e.g., psychopathy, narcissism) in LLMs trained on junk data.
- Dose-response relationship observed: higher junk ratios lead to greater cognitive decay (e.g., ARC-Challenge drops from 74.9 to 57.2).
- Error analysis reveals 'thought skipping' as a major failure mode in reasoning tasks.
- Cognitive decline persists despite post-hoc fine-tuning, indicating lasting effects of junk data exposure.
- Calls for re-examination of data collection and continual pre-training practices to prevent cumulative harms.