Show HN: Defuddle, an HTML-to-Markdown alternative to Readability
a year ago
- #readability
- #web-development
- #html-processing
- Defuddle is a tool that removes unnecessary elements from web pages to make them easily readable.
- It cleans up web pages by removing clutter like comments, sidebars, headers, footers, and other non-essential elements.
- Defuddle aims to output clean and consistent HTML documents, useful for HTML-to-Markdown converters.
- It can be used as a replacement for Mozilla Readability with differences like being more forgiving and providing consistent output for footnotes, math, and code blocks.
- Defuddle extracts metadata from the page, including schema.org data, and uses mobile styles to guess unnecessary elements.
- Installation is via npm, with Node.js requiring JSDOM for usage.
- Defuddle returns an object with properties like author, content, description, domain, favicon, image, and more.
- It is available in three bundles: Core, Full (with math equation parsing), and Node.js (optimized for Node.js environments).
- Options include debug mode, URL of the page, markdown conversion, and selector removal settings.
- Defuddle standardizes HTML elements, removes anchor links from headings, and standardizes code blocks, footnotes, and math elements.
- Building the package requires Node.js and npm, with commands to install dependencies and build the package.