Hasty Briefsbeta

How crawlers impact the operations of the Wikimedia projects

9 days ago
  • #AI scraping
  • #Wikimedia
  • #infrastructure
  • Demand for Wikimedia content, especially multimedia files, has significantly increased since 2024.
  • Rise in automated requests from scraping bots for AI training data is causing infrastructure strain.
  • Example: Jimmy Carter's Wikipedia page saw 2.8M views in a day, with video playback doubling network traffic.
  • Bandwidth for multimedia downloads grew by 50%, largely due to non-human traffic from scraper bots.
  • 65% of expensive traffic comes from bots, disproportionately consuming resources compared to human users.
  • Wikimedia is not alone; other websites and open-source projects face similar scraping challenges.
  • Need for sustainable content access models to balance free knowledge with infrastructure costs.
  • Focus on 'Responsible Use of Infrastructure' in upcoming fiscal year to prioritize human access.