Show HN: Pseudonymizing sensitive data for LLMs without losing context

8 hours ago

Built a Data Loss Prevention proxy to pseudonymize sensitive data for LLMs while retaining triage context.
Initial regex approach caused hallucinations; improved with NER, structured pseudonyms, and context-aware replacements.
V3 preserves metadata like ASN for IPs and classifies domains to maintain reasoning without exposing real data.
Combats false positives with layered detection, skiplists, and allowlists to avoid redacting technical terms.
Handles streaming with a tail buffer to ensure pseudonyms are properly restored across chunk boundaries.
Open-sourced as token-proxy on GitHub, provider-agnostic and configurable for various environments.

Hasty Briefsbeta