A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content
11 days ago
- #Data Privacy
- #AI Ethics
- #Copyright Infringement
- Meta has scraped data from 6 million websites, including top-ranked domains, to train its AI models.
- The scraped content includes copyrighted, pirated, and adult material, some potentially illegal.
- Meta bypassed web protocols like 'robots.txt' to prevent scraping, raising ethical and legal concerns.
- Whistleblowers leaked the data, criticizing Meta's support for Israel and its unethical practices.
- Meta faces lawsuits from authors and publishers over copyright infringement related to AI training data.
- The company has invested heavily in hiring top AI talent, including from OpenAI.
- Meta's scraping practices extend to Content Delivery Networks (CDNs), capturing data repeatedly.
- Legal challenges against Meta's data scraping have been dismissed on 'fair use' grounds, but concerns remain.
- Meta has faced internal discontent over its cooperation with the Israeli government and censorship of pro-Palestinian content.
- The company declined to sign the EU's AI code of practice, citing 'legal uncertainty.'