Researchers suggest OpenAI trained AI models on paywalled O'Reilly books
a year ago
- #AI Ethics
- #Copyright Infringement
- #OpenAI
- OpenAI accused of training AI on copyrighted content without permission.
- New paper alleges OpenAI used non-public, unlicensed books to train GPT-4o.
- AI models like GPT-4o rely on vast data to predict and generate content.
- Training on synthetic data risks worsening model performance.
- AI Disclosures Project claims GPT-4o recognizes paywalled O’Reilly Media books.
- DE-COP method used to detect copyrighted content in training data.
- GPT-4o shows higher recognition of paywalled content than GPT-3.5 Turbo.
- OpenAI may have sourced paywalled content from user inputs.
- OpenAI seeks high-quality training data, hiring experts to fine-tune models.
- OpenAI has licensing deals but faces lawsuits over copyright practices.