A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content

9 months ago

Meta has scraped data from 6 million websites, including top-ranked domains, to train its AI models.
The scraped content includes copyrighted, pirated, and adult material, some potentially illegal.
Meta bypassed web protocols like 'robots.txt' to prevent scraping, raising ethical and legal concerns.
Whistleblowers leaked the data, criticizing Meta's support for Israel and its unethical practices.
Meta faces lawsuits from authors and publishers over copyright infringement related to AI training data.
The company has invested heavily in hiring top AI talent, including from OpenAI.
Meta's scraping practices extend to Content Delivery Networks (CDNs), capturing data repeatedly.
Legal challenges against Meta's data scraping have been dismissed on 'fair use' grounds, but concerns remain.
Meta has faced internal discontent over its cooperation with the Israeli government and censorship of pro-Palestinian content.
The company declined to sign the EU's AI code of practice, citing 'legal uncertainty.'

Hasty Briefsbeta