Humanely Dealing with Humungus Crawlers

a day ago

Copy Link

The author hosts hobby code on their server, which attracts thousands of crawlers daily.
Countermeasures were developed to minimize annoyance to real humans while deterring crawlers.
Challenges are only presented on deep URLs, not on frequently visited pages.
Pages are cached by a reverse proxy to reduce load and avoid unnecessary challenges.
Visitors loading 'style.css' are marked as friendly, assuming they are human.
URLs visited more than once are assumed to be human traffic and bypass challenges.
Initial POW challenges were replaced with simple human-centric questions (e.g., 'How many Rs in strawberry?').
Challenges require contemplation and do not include JavaScript solvers to deter automated responses.
Log samples show patterns of crawler activity, often disguised as legitimate browsers.
The author notes that even modest server resources can be overwhelmed by persistent crawlers.
Background tasks (e.g., Mastodon, Lemmy, RSS readers) add to server load.

Hasty Briefsbeta