Hasty Briefsbeta

Is Meta Scraping the Fediverse for AI?

11 days ago
  • #Meta
  • #Fediverse
  • #AI Scraping
  • Meta is allegedly scraping independent sites, including Fediverse instances, for AI training data, disregarding robots.txt.
  • Meta denies the allegations, calling the report incorrect, but evidence suggests widespread data scraping efforts.
  • A leaked 1,659-page PDF lists numerous Fediverse instances (Mastodon, Lemmy, PeerTube) potentially affected by Meta's scraping.
  • Admins are advised to check if their instances are listed and consider federation risks, as cached posts may still be scraped.
  • Protective measures include establishing Terms of Service against scraping, requesting data removal via Meta's forms, and GDPR complaints (EU only).
  • Technical measures like firewalls (e.g., Anubis), zip bombs, and blocking AI user agents can help mitigate scraping.
  • The lack of clear regulation and corporate disregard for norms complicates efforts to combat AI scraping.