Extracting books from production language models (2026)

4 months ago

Investigates memorization and extraction of copyrighted text from production language models (LLMs).
Uses a two-phase procedure: initial probe (sometimes with Best-of-N jailbreak) and iterative continuation prompts.
Tests four production LLMs: Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, and Grok 3.
Measures extraction success with nv-recall, a block-based approximation of longest common substring.
Finds varying extraction success: Gemini 2.5 Pro and Grok 3 require no jailbreak, while Claude 3.7 Sonnet and GPT-4.1 do.
Claude 3.7 Sonnet can output entire books near-verbatim (e.g., nv-recall=95.8%).
GPT-4.1 requires more attempts and eventually refuses continuation (e.g., nv-recall=4.0%).
Highlights that extraction of copyrighted training data remains a risk for production LLMs despite safeguards.

Hasty Briefsbeta