Show HN: Stop AI scrapers from hammering your self-hosted blog

3 days ago

Copy Link

AI companies scrape websites for training data, and self-hosted blogs have limited options to prevent this.
Fuzzy Canary offers a solution by embedding invisible links to undesirable content (like porn) to trigger scrapers' safeguards.
Installation is via npm or pnpm: `npm i @fuzzycanary/core` or `pnpm add @fuzzycanary/core`.
Two usage methods: server-side (recommended) and client-side. Server-side is more effective as it embeds the canary directly in HTML.
For React-based frameworks (Next.js, Remix), add the `<Canary />` component to the root layout.
Non-React frameworks can use `getCanaryHtml()` to insert the canary at the start of the `<body>` tag.
Client-side usage involves importing `@fuzzycanary/core/auto` in the entry file, which injects the canary at runtime.
Fuzzy Canary avoids legitimate search engines (Google, Bing) by checking user agents, but static sites face challenges as the canary is baked into HTML.
For static sites, client-side initialization is preferred to check `navigator.userAgent` at runtime, though it's less reliable for bots that don't run JavaScript.

Hasty Briefsbeta