Pamela Samuelson – Does Using In-Copyright Works as Training Data Infringe?
11 hours ago
- #AI
- #copyright
- #fair-use
- Over 40 lawsuits filed in U.S. courts against GenAI developers for copyright infringement related to training data.
- Fair use is the main defense for GenAI developers, with mixed success in cases like Bartz v. Anthropic and Kadrey v. Meta.
- Judges debated the implications of using 'pirated' books and a novel 'market dilution' theory affecting human authors.
- Copyright law protects original works but is limited by the fair use doctrine, which considers four factors.
- Transformative purpose is key in fair use cases, as seen in Campbell v. Acuff-Rose.
- Bartz and Kadrey cases involved class actions against Anthropic and Meta for using books to train AI models.
- Judges found some training uses transformative but cautioned that other factors must also be considered.
- Commercial purposes can weigh against fair use, but less so if the use is transformative.
- Using pirated books as training data was criticized but not definitively ruled against fair use.
- Highly expressive works like those of Bartz and Kadrey are closer to the 'core' of copyright protection.
- Copying entire works for training was deemed reasonable given the transformative purpose.
- Lost license revenues and lost sales arguments were largely dismissed by judges.
- Market dilution theory, suggesting AI outputs flood markets and harm human authors, was seen as novel but speculative.
- Judge Chhabria suggested certain genres and authors might be more affected by AI-generated competition.
- The market dilution theory lacks precedent and may face challenges on appeal.
- Courts may award monetary relief rather than issue injunctions against GenAI training.
- Bartz and Kadrey decisions indicate some training uses may be fair, while others may not, leaving the issue unresolved.