Show HN: Pseudonymizing sensitive data for LLMs without losing context
8 hours ago
- #Data Privacy
- #Incident Response
- #LLM Security
- Built a Data Loss Prevention proxy to pseudonymize sensitive data for LLMs while retaining triage context.
- Initial regex approach caused hallucinations; improved with NER, structured pseudonyms, and context-aware replacements.
- V3 preserves metadata like ASN for IPs and classifies domains to maintain reasoning without exposing real data.
- Combats false positives with layered detection, skiplists, and allowlists to avoid redacting technical terms.
- Handles streaming with a tail buffer to ensure pseudonyms are properly restored across chunk boundaries.
- Open-sourced as token-proxy on GitHub, provider-agnostic and configurable for various environments.