Messing with Scraper Bots
8 days ago
- #web-security
- #bot-mitigation
- #markov-chains
- Scrapers are inadvertently DDoSing public websites, prompting requests for protection advice.
- A novel approach involves feeding scrapers endless junk data via a Markov chain babbler to waste their resources.
- Malicious bots targeting vulnerable files like .env and .php are the primary adversaries.
- The author developed a system to serve fake .php files, aiming to waste bot time and resources.
- Efficiency challenges led to creating a static garbage server serving random Frankenstein novel paragraphs to exploit crawlers' breadth-first behavior.
- The project includes counters for requests and is designed to avoid indexing by legitimate search engines.
- Caution is advised as deploying such measures on critical sites risks penalties from search engines.
- Hidden links are used to bait malicious scrapers without affecting site integrity.
- The project was a learning experience in Markov chains and bot behavior, driven by fun and curiosity.