Gone but Not Forgotten: Recovering the Dead Web
4 hours ago
- #web-archiving
- #link-rot
- #digital-preservation
- 38% of webpages from 2013 are no longer accessible after a decade, and 25% of pages from 2013-2023 are dead.
- The Wayback Machine rescues around 15% of dead pages from the Pew dataset, reducing overall vanished URLs from 26% to 10% for that sample.
- Other studies report varying link-rot rates: Ahrefs finds 66.5% dead links over nine years, while a 2021 NYTimes link analysis shows 25% deep link rot.
- The ODU study of 27.3 million URLs indicates 65% dead by 2023, but all sampled URLs are archived by the Wayback Machine.
- Key terminologies include 'Rescued' (dead on live web but archived) and 'Endangered' (alive but unarchived, at risk of vanishing).
- Limitations in archiving include resource constraints, JavaScript-heavy pages, bot blocking, and paywalls, but initiatives like IndexNow aim to improve link discovery.