Extracting books from production language models (2026)
4 months ago
- #LLMs
- #memorization
- #copyright
- Investigates memorization and extraction of copyrighted text from production language models (LLMs).
- Uses a two-phase procedure: initial probe (sometimes with Best-of-N jailbreak) and iterative continuation prompts.
- Tests four production LLMs: Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, and Grok 3.
- Measures extraction success with nv-recall, a block-based approximation of longest common substring.
- Finds varying extraction success: Gemini 2.5 Pro and Grok 3 require no jailbreak, while Claude 3.7 Sonnet and GPT-4.1 do.
- Claude 3.7 Sonnet can output entire books near-verbatim (e.g., nv-recall=95.8%).
- GPT-4.1 requires more attempts and eventually refuses continuation (e.g., nv-recall=4.0%).
- Highlights that extraction of copyrighted training data remains a risk for production LLMs despite safeguards.