Detecting and Preventing Distillation Attacks

5 hours ago

Industrial-scale distillation attacks by AI labs (DeepSeek, Moonshot, MiniMax) to extract Claude's capabilities.
Over 16 million exchanges via 24,000 fraudulent accounts, violating terms of service and regional restrictions.
Distillation is a legitimate method but used illicitly to acquire capabilities quickly and cheaply.
Growing intensity and sophistication of these campaigns require coordinated industry and policy action.
Illicitly distilled models lack safeguards, posing national security risks (e.g., bioweapons, cyber threats).
Foreign labs can feed unprotected capabilities into military, intelligence, and surveillance systems.
Distillation attacks undermine US export controls by allowing foreign labs to bypass restrictions.
DeepSeek targeted reasoning, grading tasks, and censorship-safe alternatives (150,000+ exchanges).
Moonshot focused on agentic reasoning, coding, and computer vision (3.4M+ exchanges).
MiniMax aimed at agentic coding and tool use (13M+ exchanges), pivoting quickly to new model releases.
Labs use proxy services ('hydra clusters') to bypass regional access restrictions.
Anthropic's response includes detection, intelligence sharing, access controls, and countermeasures.
No single company can solve this; industry-wide coordination is needed.

Hasty Briefsbeta