How crawlers impact the operations of the Wikimedia projects

9 days ago

Copy Link

Demand for Wikimedia content, especially multimedia files, has significantly increased since 2024.
Rise in automated requests from scraping bots for AI training data is causing infrastructure strain.
Example: Jimmy Carter's Wikipedia page saw 2.8M views in a day, with video playback doubling network traffic.
Bandwidth for multimedia downloads grew by 50%, largely due to non-human traffic from scraper bots.
65% of expensive traffic comes from bots, disproportionately consuming resources compared to human users.
Wikimedia is not alone; other websites and open-source projects face similar scraping challenges.
Need for sustainable content access models to balance free knowledge with infrastructure costs.
Focus on 'Responsible Use of Infrastructure' in upcoming fiscal year to prioritize human access.

Hasty Briefsbeta