6 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
The article discusses creative methods to combat malicious web scrapers using Markov chains to generate fake PHP files and serve them as decoys. It also outlines the risks involved and suggests that while these tactics can be fun, they may not be suitable for all websites.
If you do, here's more
The author highlights the issue of web scrapers that inadvertently overload public websites, prompting many small service operators to seek protection advice. Instead of focusing on defensive measures, the author explores a counter-offensive strategy. Inspired by a Markov chain babbler that generates endless streams of junk data to mislead scrapers, they built their own version capable of producing realistic-looking content. This approach aims to occupy the resources of malicious bots that scrape sensitive information from poorly configured sites.
The author shares their experience training a Markov chain on PHP files, generating fake responses that mimic real code. They tested this by serving progressively larger files, but faced challenges with server response times as file sizes increased. To improve efficiency, they pivoted to a static site model, using the classic novel "Frankenstein" to create an endless loop of generated content. This technique exploits the breadth-first crawling method, effectively overwhelming scrapers with a large volume of data.
Cautionary advice accompanies the project. The author warns about the risks of using such techniques, particularly regarding search engine indexing. Although the PHP babbler is deemed safe since it targets malicious bots, the static site for generated content poses a risk of being misclassified as spam by search engines like Google. The author concludes with a playful note about hiding a link to the babbler on their blog to attract bad scrapers, while expressing concern about potential limits on their VPS's outbound transfer budget. This project provided valuable insights into Markov chains and the behavior of scraper bots, driven by a mix of curiosity and frustration.
Questions about this article
No questions yet.