Researchers Deanonymize Reddit and Hacker News Users at Scale
7 hours ago
- #LLMs
- #deanonymization
- #privacy
- ETH Zurich study shows LLMs can deanonymize pseudonymous accounts with 68% recall at 90% precision.
- A four-stage pipeline was used: Extract identity signals → Search via embeddings → Reason over candidates → Calibrate confidence scores.
- Results: Hacker News → LinkedIn (45.1% recall at 99% precision), Reddit movie communities (2.8% recall at 99% precision), Temporal Reddit splits (38.4% recall at 99% precision).
- Fully autonomous agents correctly identified 67% of users at 90% precision, costing $1-4 per deanonymization.
- Classical deanonymization methods had significantly lower recall rates (0-0.2%).
- Pseudonymity is no longer practical; persistent usernames can be linked to real identities.
- More posts make users easier to identify (48% recall for users sharing 10+ movies vs. 3% for one movie).
- Platforms should rate-limit API access, restrict bulk data exports, and consider privacy costs of public scrapable data.
- Researchers and activists should compartmentalize identities and assume LLM-powered deanonymization is a threat.
- LLMs excel at extracting unstructured signals, semantic search, and reasoning, reducing deanonymization costs.
- Threatened groups include whistleblowers, activists, abuse survivors, and others relying on anonymity.
- Mitigations like k-anonymity and differential privacy are ineffective for text anonymization.