Hasty Briefsbeta

Show HN: Stop AI scrapers from hammering your self-hosted blog

3 days ago
  • #AI scraping
  • #SEO
  • #web development
  • AI companies scrape websites for training data, and self-hosted blogs have limited options to prevent this.
  • Fuzzy Canary offers a solution by embedding invisible links to undesirable content (like porn) to trigger scrapers' safeguards.
  • Installation is via npm or pnpm: `npm i @fuzzycanary/core` or `pnpm add @fuzzycanary/core`.
  • Two usage methods: server-side (recommended) and client-side. Server-side is more effective as it embeds the canary directly in HTML.
  • For React-based frameworks (Next.js, Remix), add the `<Canary />` component to the root layout.
  • Non-React frameworks can use `getCanaryHtml()` to insert the canary at the start of the `<body>` tag.
  • Client-side usage involves importing `@fuzzycanary/core/auto` in the entry file, which injects the canary at runtime.
  • Fuzzy Canary avoids legitimate search engines (Google, Bing) by checking user agents, but static sites face challenges as the canary is baked into HTML.
  • For static sites, client-side initialization is preferred to check `navigator.userAgent` at runtime, though it's less reliable for bots that don't run JavaScript.