Noroboto: Lying Fonts and Mitigation in Rust
8 hours ago
- #LegalTech AI
- #Unicode Security
- #PDF Rendering
- The most exciting phrase in science, heralding new discoveries, is 'That's funny...'
- Switching from PDFium to hayro in Rust for PDF rendering led to discovering a bug with double-t 'tt' non-Unicode values, which also affected PDFium.
- This discovery raised concerns about adversarial exploitation of specification complexity and imperfections in legal tech stacks (AI native law firms).
- Noroboto.ttf is a malicious font that obfuscates Unicode mappings in embedded fonts, aiming to deceive AI agents in legal pipelines by using Private Use Areas (PUA).
- Full obfuscation was partially defeated by advanced LLMs (e.g., ChatGPT 5.5), but partial obfuscation and Unicode replacement attacks proved more effective by exploiting agent laziness.
- Partial obfuscation hides adversarial terms (e.g., 'successors and assigns' in an NDA), while replacement swaps human-visible text (e.g., 'Maryland' with 'Delaware' Unicode values).
- A proof-of-concept mitigation in Tritium uses Rust to verify font accuracy by comparing expected ASCII strings with OCR results, calculating a Levenshtein distance-based accuracy score.
- The approach involves creating a font atlas, rendering glyphs, and using OCR to detect deceptive fonts, with tests confirming perfect accuracy for legitimate fonts and imperfections for malicious ones.
- Ethical and legal considerations of such attacks are noted, with prior art referenced, and the ease of generating these attacks with off-the-shelf models is highlighted.