How Meta AI Staff Deemed More Than 7M Books to Have No "Economic Value"

a year ago

Meta AI initially stated that using pirated books for AI training is a copyright violation, but later responses varied, citing 'hallucinations' in generative AI.
Meta is facing a lawsuit (Kadrey et al. v. Meta Platforms) for allegedly using over 7 million pirated books to train its AI model, Llama, without consent or payment.
Plaintiffs, including prominent authors like Junot Díaz and Sarah Silverman, argue Meta's actions infringe on copyright, while Meta defends its use as 'highly transformative' fair use.
The case is part of a broader legal battle involving over 16 copyright lawsuits against AI companies for using copyrighted material without permission.
Internal Meta communications reveal debates over using pirated books, with some employees expressing ethical concerns while others adopted a 'don’t-ask-don’t-tell' approach.
Meta argues that individual books have negligible impact on AI performance and that licensing millions of works is impractical, likening it to noise in data.
Authors and publishers, including the Authors Guild, advocate for consent and compensation for AI training, fearing AI-generated content could replace human creativity.
OpenAI and Google have also faced scrutiny for using pirated content, though OpenAI claims its current models do not rely on LibGen.
The case raises questions about the commodification of literature and the ethical implications of AI training on copyrighted works without compensation.

Hasty Briefsbeta