[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Sks-devel] robots.txt, grub-client
From: |
Jason Harris |
Subject: |
Re: [Sks-devel] robots.txt, grub-client |
Date: |
Thu, 23 Dec 2004 23:25:48 -0500 |
User-agent: |
Mutt/1.4.2.1i |
On Thu, Dec 23, 2004 at 10:14:16PM -0500, Yaron Minsky wrote:
> Jason, do you have any suggestions as to how SKS could be extended to
> block inappropriate requests?
Any keyserver admin. with root privs may want to block the IPs of
webcrawlers (and keyserver abusers) using firewall software first
and foremost.
The difficulty is with a distributed crawler like grub[.org], where
the requests can come from anybody running a crawling client.
For that, it is easiest to block on User-Agent: headers which report
"grub-client" (in a substring search). Returning "403 Forbidden" is
probably best.
I would suggest a file called "blocklist" that is read at least at
startup and specifies IP address ranges (in all three forms below)
and User-Agent: [sub]strings something like this:
user-agent: grub-client # all(?) grub crawlers by name
user-agent: Googlebot # all(?) Googlebot crawlers by name
ip: 207.68.128.0 - 207.68.207.255 # some M$ crawlers by IP
ip: 66.196.64.0/18 # some Yahoo! crawlers by IP
ip: 207.245.72.170 # actual ASDL abuser
(Use (C syntax) "(0xFFFFFFFFUL << (32 - n))" to mask off a /n netblock.)
--
Jason Harris | NIC: JH329, PGP: This _is_ PGP-signed, isn't it?
address@hidden _|_ web: http://keyserver.kjsl.com/~jharris/
Got photons? (TM), (C) 2004
pgp8odHAz2HRb.pgp
Description: PGP signature