Is Meta Scraping the Fediverse for AI?
11 days ago
- #Meta
- #Fediverse
- #AI Scraping
- Meta is allegedly scraping independent sites, including Fediverse instances, for AI training data, disregarding robots.txt.
- Meta denies the allegations, calling the report incorrect, but evidence suggests widespread data scraping efforts.
- A leaked 1,659-page PDF lists numerous Fediverse instances (Mastodon, Lemmy, PeerTube) potentially affected by Meta's scraping.
- Admins are advised to check if their instances are listed and consider federation risks, as cached posts may still be scraped.
- Protective measures include establishing Terms of Service against scraping, requesting data removal via Meta's forms, and GDPR complaints (EU only).
- Technical measures like firewalls (e.g., Anubis), zip bombs, and blocking AI user agents can help mitigate scraping.
- The lack of clear regulation and corporate disregard for norms complicates efforts to combat AI scraping.