Show HN: Stop AI scrapers from hammering your self-hosted blog
3 days ago
- #AI scraping
- #SEO
- #web development
- AI companies scrape websites for training data, and self-hosted blogs have limited options to prevent this.
- Fuzzy Canary offers a solution by embedding invisible links to undesirable content (like porn) to trigger scrapers' safeguards.
- Installation is via npm or pnpm: `npm i @fuzzycanary/core` or `pnpm add @fuzzycanary/core`.
- Two usage methods: server-side (recommended) and client-side. Server-side is more effective as it embeds the canary directly in HTML.
- For React-based frameworks (Next.js, Remix), add the `<Canary />` component to the root layout.
- Non-React frameworks can use `getCanaryHtml()` to insert the canary at the start of the `<body>` tag.
- Client-side usage involves importing `@fuzzycanary/core/auto` in the entry file, which injects the canary at runtime.
- Fuzzy Canary avoids legitimate search engines (Google, Bing) by checking user agents, but static sites face challenges as the canary is baked into HTML.
- For static sites, client-side initialization is preferred to check `navigator.userAgent` at runtime, though it's less reliable for bots that don't run JavaScript.