Hasty Briefsbeta

Messing with Scraper Bots

8 days ago
  • #web-security
  • #bot-mitigation
  • #markov-chains
  • Scrapers are inadvertently DDoSing public websites, prompting requests for protection advice.
  • A novel approach involves feeding scrapers endless junk data via a Markov chain babbler to waste their resources.
  • Malicious bots targeting vulnerable files like .env and .php are the primary adversaries.
  • The author developed a system to serve fake .php files, aiming to waste bot time and resources.
  • Efficiency challenges led to creating a static garbage server serving random Frankenstein novel paragraphs to exploit crawlers' breadth-first behavior.
  • The project includes counters for requests and is designed to avoid indexing by legitimate search engines.
  • Caution is advised as deploying such measures on critical sites risks penalties from search engines.
  • Hidden links are used to bait malicious scrapers without affecting site integrity.
  • The project was a learning experience in Markov chains and bot behavior, driven by fun and curiosity.