More than 340 local news outlets are limiting the Internet Archive's access
4 hours ago
- #AI scraping
- #news preservation
- #Internet Archive
- Major news publishers like The New York Times and The Guardian are blocking the Internet Archive to prevent AI companies from scraping training data.
- Over 340 local news sites in the U.S. have restricted access, with many owned by large publishers such as USA Today Co. and Alden Global Capital subsidiaries.
- Journalists and researchers rely on the Wayback Machine for archival news, warning that blocks threaten long-term preservation of primary sources.
- Publishers cite concerns over intellectual property and leverage in AI licensing deals as reasons for blocking, despite the Archive's anti-abuse measures.
- The blocking trend has expanded to include international outlets like Brazil's Folha de S.Paulo and major publishers like Condé Nast and The Atlantic.
- News organizations face challenges in archiving due to costs, CMS changes, and closures, with initiatives like the Internet Archive's training program aiming to help.
- While commercial archives like ProQuest exist, the Internet Archive offers free preservation, highlighting tensions between accessibility and economic incentives.