Hasty Briefsbeta

A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content

10 days ago
  • #Data Privacy
  • #AI Ethics
  • #Copyright Infringement
  • Meta has scraped data from 6 million websites, including top-ranked domains, to train its AI models.
  • The scraped content includes copyrighted, pirated, and adult material, some potentially illegal.
  • Meta bypassed web protocols like 'robots.txt' to prevent scraping, raising ethical and legal concerns.
  • Whistleblowers leaked the data, criticizing Meta's support for Israel and its unethical practices.
  • Meta faces lawsuits from authors and publishers over copyright infringement related to AI training data.
  • The company has invested heavily in hiring top AI talent, including from OpenAI.
  • Meta's scraping practices extend to Content Delivery Networks (CDNs), capturing data repeatedly.
  • Legal challenges against Meta's data scraping have been dismissed on 'fair use' grounds, but concerns remain.
  • Meta has faced internal discontent over its cooperation with the Israeli government and censorship of pro-Palestinian content.
  • The company declined to sign the EU's AI code of practice, citing 'legal uncertainty.'