Hasty Briefsbeta

Humanely Dealing with Humungus Crawlers

a day ago
  • #web-security
  • #server-optimization
  • #crawler-mitigation
  • The author hosts hobby code on their server, which attracts thousands of crawlers daily.
  • Countermeasures were developed to minimize annoyance to real humans while deterring crawlers.
  • Challenges are only presented on deep URLs, not on frequently visited pages.
  • Pages are cached by a reverse proxy to reduce load and avoid unnecessary challenges.
  • Visitors loading 'style.css' are marked as friendly, assuming they are human.
  • URLs visited more than once are assumed to be human traffic and bypass challenges.
  • Initial POW challenges were replaced with simple human-centric questions (e.g., 'How many Rs in strawberry?').
  • Challenges require contemplation and do not include JavaScript solvers to deter automated responses.
  • Log samples show patterns of crawler activity, often disguised as legitimate browsers.
  • The author notes that even modest server resources can be overwhelmed by persistent crawlers.
  • Background tasks (e.g., Mastodon, Lemmy, RSS readers) add to server load.