AI content flood: why the web's signal is dying
8 hours ago
- #Web Quality
- #AI Content
- #Information Theory
- AI-generated content is flooding the web, increasing volume but potentially reducing genuine information diversity, a phenomenon termed 'epistemic heat death' by Jarosław Szulc.
- Key theoretical foundations include Claude Shannon's information theory (AI as a zero-information source relative to training data) and George Akerlof's 'market for lemons' (trust collapses when quality is unverifiable).
- Three laws proposed: verified human content gains value as diversity shrinks (I), traffic metrics don't measure information value (II), and verification's premium compounds over time (III).
- Data is mixed: some studies show high AI involvement in new content, but a Common Crawl analysis found AI-generated content plateaued after briefly overtaking human content in late 2024.
- Human-authored content still dominates search rankings and chatbot citations, suggesting filtering mechanisms may be favoring quality over raw volume.
- Provenance efforts like C2PA and SynthID aim to verify content authenticity, but adoption faces challenges like metadata stripping during platform compression and low user engagement with verification badges.
- The paper includes a mathematical model indicating that the rate of collapse matters more than raw synthetic content volume, and early interventions are more effective than later ones.
- Critiques note the model is illustrative rather than predictive, empirical data is contested, and verification systems may exacerbate inequalities by being costly or inaccessible to some groups.
- A concerning implication: flooding channels with synthetic content can be an attack in itself, raising the cost of distinguishing truth from noise and exploiting cognitive biases like the fluency heuristic.
- The paper is a working draft open to empirical testing, with key indicators to watch including search gaps, C2PA engagement, and trends in AI content share.