Hasty Briefsbeta

Bilingual

Researchers suggest OpenAI trained AI models on paywalled O'Reilly books

a year ago
  • #AI Ethics
  • #Copyright Infringement
  • #OpenAI
  • OpenAI accused of training AI on copyrighted content without permission.
  • New paper alleges OpenAI used non-public, unlicensed books to train GPT-4o.
  • AI models like GPT-4o rely on vast data to predict and generate content.
  • Training on synthetic data risks worsening model performance.
  • AI Disclosures Project claims GPT-4o recognizes paywalled O’Reilly Media books.
  • DE-COP method used to detect copyrighted content in training data.
  • GPT-4o shows higher recognition of paywalled content than GPT-3.5 Turbo.
  • OpenAI may have sourced paywalled content from user inputs.
  • OpenAI seeks high-quality training data, hiring experts to fine-tune models.
  • OpenAI has licensing deals but faces lawsuits over copyright practices.