Hasty Briefsbeta

Bilingual

X-ray: a Python library for finding bad redactions in PDF documents

4 months ago
  • #security
  • #PDF
  • #redaction
  • x-ray is a Python library for detecting improper redactions in PDF documents.
  • Improper redactions often involve black rectangles or highlights over text, which can still be selected and read.
  • The tool analyzes PDFs to identify such bad redactions and outputs JSON with details like page numbers, bounding boxes, and hidden text.
  • x-ray can be installed via pip or uv and used via command line or as a Python module.
  • It supports local files, URLs, and in-memory PDF bytes.
  • Under the hood, x-ray uses PyMuPDF for PDF parsing and image-based rectangle inspection.
  • The project is open-source (BSD license) and welcomes contributions, though a signed CLA is required.
  • Releases are automated via GitHub Actions, triggered by version updates and tags.