[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Savannah-hackers-public] Web Crawler Bots
From: |
Karl Berry |
Subject: |
Re: [Savannah-hackers-public] Web Crawler Bots |
Date: |
Sat, 7 Jan 2017 23:36:59 GMT |
Bob - as you probably know, there are some existing fail2ban filters for
this -- {apache,nginx}-botsearch.conf are the most apropos I see at
first glance. fail2ban is the only scalable/maintainable way I can
imagine to deal with it.
A nonscalable/nonmaintainable way ... for tug.org, years ago I created a
robots.txt based on spammer user-agent strings I found at
projecthoneypot.org
(https://www.projecthoneypot.org/harvester_useragents.php nowadays, it
seems). It's still somewhat beneficial, though naturally it was surely
out of date the instant I put it up, let alone now. I also threw in
iptable rules by hand when the server was getting bogged down. I hope
one day I'll set up fail2ban (including recidive) for it ... -k