Humanely Dealing with Humungus Crawlers
a day ago
- #web-security
- #server-optimization
- #crawler-mitigation
- The author hosts hobby code on their server, which attracts thousands of crawlers daily.
- Countermeasures were developed to minimize annoyance to real humans while deterring crawlers.
- Challenges are only presented on deep URLs, not on frequently visited pages.
- Pages are cached by a reverse proxy to reduce load and avoid unnecessary challenges.
- Visitors loading 'style.css' are marked as friendly, assuming they are human.
- URLs visited more than once are assumed to be human traffic and bypass challenges.
- Initial POW challenges were replaced with simple human-centric questions (e.g., 'How many Rs in strawberry?').
- Challenges require contemplation and do not include JavaScript solvers to deter automated responses.
- Log samples show patterns of crawler activity, often disguised as legitimate browsers.
- The author notes that even modest server resources can be overwhelmed by persistent crawlers.
- Background tasks (e.g., Mastodon, Lemmy, RSS readers) add to server load.