bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#52338: Crawler bots are downloading substitutes


From: Tobias Geerinckx-Rice
Subject: bug#52338: Crawler bots are downloading substitutes
Date: Fri, 10 Dec 2021 23:52:51 +0100

All,

Mark H Weaver 写道:
For what it's worth: during the years that I administered Hydra, I found that many bots disregarded the robots.txt file that was in place there. In practice, I found that I needed to periodically scan the access logs for bots and forcefully block their requests in order to keep Hydra from
becoming overloaded with expensive queries from bots.

Very good point.

IME (which is a few years old at this point) at least the highlighted BingBot & SemrushThing always respected my robots.txt, but it's definitely a concern. I'll leave this bug open to remind us of that in a few weeks or so…

If it does become a problem, we (I) might add some basic User-Agent sniffing to either slow down or outright block non-Guile downloaders. Whitelisting any legitimate ones, of course. I think that's less hassle than dealing with dynamic IP blocks whilst being equally effective here.

Thanks (again) for taking care of Hydra, Mark, and thank you Leo for keeping an eye on Cuirass :-)

T G-R

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]