Hasty Briefsbeta

Bilingual

OpenAI's models 'memorized' copyrighted content, new study suggests

a year ago
  • #OpenAI
  • #AI Copyright
  • #Training Data
  • A new study suggests OpenAI may have trained some AI models on copyrighted content without permission.
  • OpenAI faces lawsuits from authors and programmers alleging unauthorized use of their works for model training.
  • The study introduces a method to detect 'memorized' training data in AI models using 'high-surprisal' words.
  • GPT-4 and GPT-3.5 were tested and showed signs of memorizing portions of fiction books and New York Times articles.
  • The findings highlight the need for greater transparency and tools to audit AI training data.
  • OpenAI advocates for looser restrictions on using copyrighted data for AI training, despite existing lawsuits.