Improving PixelMelt's Kindle Web Deobfuscator
11 hours ago
- #OCR
- #eBooks
- #DRM
- PixelMelt published a method to download Amazon Kindle books without DRM by spoofing a web browser and reconstructing obfuscated SVGs.
- Initial approach had issues with OCR accuracy, especially with ambiguous characters like full-stops and commas.
- Line-breaks were incorrectly placed, disrupting the reflowable nature of eBooks.
- A new approach was developed, focusing on OCRing entire pages rather than single characters for better accuracy.
- Characters were extracted, resized, and placed on a blank page based on JSON data, then OCRed using Tesseract 5.
- OCR results were not perfect, with issues like missing superscript numerals and lack of semantic meaning.
- Images and certain formatting elements were not recoverable due to encryption and OCR limitations.
- The author suggests avoiding Amazon for eBook purchases, recommending Kobo for easier DRM bypass.
- Comments from readers praise the effort as part of a broader resistance against restrictive digital practices.