Wikipedia offers AI developers its article data on Kaggle to stop scraping
a year ago
- #AI
- #Kaggle
- #Wikipedia
- Wikimedia Foundation offers AI-ready Wikipedia dataset on Kaggle to prevent scraping.
- Dataset includes structured JSON with abstracts, descriptions, infobox data, and image links.
- Content is licensed under Creative Commons and GNU Free Documentation License.
- Kaggle hosts over 461,000 datasets, now including Wikipedia's English and French editions.
- Dataset aims to reduce server load and provide clean, pre-parsed data for AI training.
- Early beta release welcomes community feedback and discussions.