X-ray: a Python library for finding bad redactions in PDF documents
4 months ago
- #security
- #redaction
- x-ray is a Python library for detecting improper redactions in PDF documents.
- Improper redactions often involve black rectangles or highlights over text, which can still be selected and read.
- The tool analyzes PDFs to identify such bad redactions and outputs JSON with details like page numbers, bounding boxes, and hidden text.
- x-ray can be installed via pip or uv and used via command line or as a Python module.
- It supports local files, URLs, and in-memory PDF bytes.
- Under the hood, x-ray uses PyMuPDF for PDF parsing and image-based rectangle inspection.
- The project is open-source (BSD license) and welcomes contributions, though a signed CLA is required.
- Releases are automated via GitHub Actions, triggered by version updates and tags.