Hasty Briefsbeta

Bilingual

More than 340 local news outlets are limiting the Internet Archive's access

4 hours ago
  • #AI scraping
  • #news preservation
  • #Internet Archive
  • Major news publishers like The New York Times and The Guardian are blocking the Internet Archive to prevent AI companies from scraping training data.
  • Over 340 local news sites in the U.S. have restricted access, with many owned by large publishers such as USA Today Co. and Alden Global Capital subsidiaries.
  • Journalists and researchers rely on the Wayback Machine for archival news, warning that blocks threaten long-term preservation of primary sources.
  • Publishers cite concerns over intellectual property and leverage in AI licensing deals as reasons for blocking, despite the Archive's anti-abuse measures.
  • The blocking trend has expanded to include international outlets like Brazil's Folha de S.Paulo and major publishers like Condé Nast and The Atlantic.
  • News organizations face challenges in archiving due to costs, CMS changes, and closures, with initiatives like the Internet Archive's training program aiming to help.
  • While commercial archives like ProQuest exist, the Internet Archive offers free preservation, highlighting tensions between accessibility and economic incentives.