Hasty Briefsbeta

Bilingual

Researchers Deanonymize Reddit and Hacker News Users at Scale

5 hours ago
  • #LLMs
  • #deanonymization
  • #privacy
  • ETH Zurich study shows LLMs can deanonymize pseudonymous accounts with 68% recall at 90% precision.
  • A four-stage pipeline was used: Extract identity signals → Search via embeddings → Reason over candidates → Calibrate confidence scores.
  • Results: Hacker News → LinkedIn (45.1% recall at 99% precision), Reddit movie communities (2.8% recall at 99% precision), Temporal Reddit splits (38.4% recall at 99% precision).
  • Fully autonomous agents correctly identified 67% of users at 90% precision, costing $1-4 per deanonymization.
  • Classical deanonymization methods had significantly lower recall rates (0-0.2%).
  • Pseudonymity is no longer practical; persistent usernames can be linked to real identities.
  • More posts make users easier to identify (48% recall for users sharing 10+ movies vs. 3% for one movie).
  • Platforms should rate-limit API access, restrict bulk data exports, and consider privacy costs of public scrapable data.
  • Researchers and activists should compartmentalize identities and assume LLM-powered deanonymization is a threat.
  • LLMs excel at extracting unstructured signals, semantic search, and reasoning, reducing deanonymization costs.
  • Threatened groups include whistleblowers, activists, abuse survivors, and others relying on anonymity.
  • Mitigations like k-anonymity and differential privacy are ineffective for text anonymization.