[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
01/01: nginx: hydra.gnu.org: Block requests from SeznamBot.
From: |
Mark H. Weaver |
Subject: |
01/01: nginx: hydra.gnu.org: Block requests from SeznamBot. |
Date: |
Fri, 7 Jul 2017 12:18:26 -0400 (EDT) |
mhw pushed a commit to branch master
in repository maintenance.
commit ecb68166b5be71368471fd20f2543db0afdf8eb9
Author: Mark H Weaver <address@hidden>
Date: Fri Jul 7 12:15:26 2017 -0400
nginx: hydra.gnu.org: Block requests from SeznamBot.
* hydra/nginx/hydra.gnu.org.conf: Filter out SeznamBot by user-agent.
---
hydra/nginx/hydra.gnu.org.conf | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/hydra/nginx/hydra.gnu.org.conf b/hydra/nginx/hydra.gnu.org.conf
index e7d77a5..2afc838 100644
--- a/hydra/nginx/hydra.gnu.org.conf
+++ b/hydra/nginx/hydra.gnu.org.conf
@@ -86,14 +86,15 @@ http {
proxy_set_header X-Forwarded-Port $server_port;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
- # XXX Block AhrefsBot, Baiduspider, and Bing for now
- # TODO: Remove later in the hopes that they'll fetch
- # our improved /robots.txt before continuing.
+ # XXX Block AhrefsBot, Baiduspider, Bing, SeznamBot, and
+ # Google. These search engines seem to disregard our robots.txt,
+ # possibly because attempts to fetch robots.txt sometimes fails
+ # due to gateway timeout :-(
# Also block ltx71.com, which accesses our pages ~30 times/hour
# with no apparent pattern, including our robots.txt which it
# disregards. They claim to be "scanning the internet for
# security research purposes."
- if ($http_user_agent ~
"AhrefsBot|Baiduspider|bingbot|ltx71.com|GoogleBot|Googlebot") {
+ if ($http_user_agent ~
"AhrefsBot|Baiduspider|bingbot|SeznamBot|ltx71.com|GoogleBot|Googlebot") {
return 403;
break;
}