How crawlers impact the operations of the Wikimedia projects
9 days ago
- #AI scraping
- #Wikimedia
- #infrastructure
- Demand for Wikimedia content, especially multimedia files, has significantly increased since 2024.
- Rise in automated requests from scraping bots for AI training data is causing infrastructure strain.
- Example: Jimmy Carter's Wikipedia page saw 2.8M views in a day, with video playback doubling network traffic.
- Bandwidth for multimedia downloads grew by 50%, largely due to non-human traffic from scraper bots.
- 65% of expensive traffic comes from bots, disproportionately consuming resources compared to human users.
- Wikimedia is not alone; other websites and open-source projects face similar scraping challenges.
- Need for sustainable content access models to balance free knowledge with infrastructure costs.
- Focus on 'Responsible Use of Infrastructure' in upcoming fiscal year to prioritize human access.