Hasty Briefsbeta

Bilingual

Show HN: Defuddle, an HTML-to-Markdown alternative to Readability

a year ago
  • #readability
  • #web-development
  • #html-processing
  • Defuddle is a tool that removes unnecessary elements from web pages to make them easily readable.
  • It cleans up web pages by removing clutter like comments, sidebars, headers, footers, and other non-essential elements.
  • Defuddle aims to output clean and consistent HTML documents, useful for HTML-to-Markdown converters.
  • It can be used as a replacement for Mozilla Readability with differences like being more forgiving and providing consistent output for footnotes, math, and code blocks.
  • Defuddle extracts metadata from the page, including schema.org data, and uses mobile styles to guess unnecessary elements.
  • Installation is via npm, with Node.js requiring JSDOM for usage.
  • Defuddle returns an object with properties like author, content, description, domain, favicon, image, and more.
  • It is available in three bundles: Core, Full (with math equation parsing), and Node.js (optimized for Node.js environments).
  • Options include debug mode, URL of the page, markdown conversion, and selector removal settings.
  • Defuddle standardizes HTML elements, removes anchor links from headings, and standardizes code blocks, footnotes, and math elements.
  • Building the package requires Node.js and npm, with commands to install dependencies and build the package.