savannah-hackers-public
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Savannah-hackers-public] [gnu.org #705563] many messages missing from m


From: Bernie Innocenti via RT
Subject: [Savannah-hackers-public] [gnu.org #705563] many messages missing from mail archives
Date: Mon, 22 Aug 2011 16:04:33 -0400

> [karl - Sun Aug 21 21:01:41 2011]:
> 
> Hello sysadmins,
> 
> Unfortunately, it seems that messages are occasionally going missing
> from the archives on lists.gnu.org, even though they are being
> delivered.
> 
> Here is the example I have the most data for:
> on Wed 10 Aug 2011 03:28:00 AM PDT, Shailesh posted comment #8 on a
> savannah ticket,  https://savannah.gnu.org/support/?107667#comment8.
> This was received by me (and others) in email -- I will attach the
> text.  It was Message-Id: <20110810-
>    address@hidden>.
> 
> However, looking at the thread index:
> https://lists.gnu.org/archive/html/savannah-hackers/2011-
>    08/threads.html
> it does not appear (the thread, "CLISP: Permission denied"), is about
> halfway down the page.  All other comments in the thread are there,
> including several from Shailesh.
> 
> Looking at the date index, for August 10:
> https://lists.gnu.org/archive/html/savannah-hackers/2011-08/index.html
> Shailesh's message is also not there.  (The message from him on August
> 11 is the next one, comment #9.)
> 
> Bizarrely, it apparently did not reach mail-archive.com, either.
> It would be the next message from comment #7,
> http://www.mail-archive.com/address@hidden/msg17425.html
> but it jumps to comment #9, like our thread index, even though
> address@hidden is subscribed to savannah-hackers.
> I don't get that.
> 
> It is also not in the mbox archive,
> /var/lib/mailman/archives/private/savannah-hackers.mbox.
> 
> Shailesh reposted his comment exactly, as comment #11, to test if it
> would reach the archives this time.  It did.  So, as one might guess,
>    it
> is not about the content but about something happening at the time of
> the mail processing.
> 
> Now, the worst part: a Google search shows thousands of missing
> messages, past and present, even discounting google's overcounting of
> results and the likelihood that some of them are just threading
> computations going wrong.
> http://www.google.com/search?
hl=en&safe=off&q=site%3Alists.gnu.org+threads.html+"message+not+available"&oq=site%3Alists.gnu.org+threads.
html+"message+not+available"&aq=f&aqi=&aql=&gs_sm=e&gs_upl=19750l21000l0l21202l13l9l0l0l0l6l222l1231l2.6.1l
9l0
> 
> Help?

It took me a while because the logs for 20110810 had already been rotated, but 
I finally figured out what 
happened: the post had been marked as spam on eggs (note the 
take_sa_hint_router) and has been ditched:

 2011-08-10 06:26:53 [3630] 1Qr5zt-0000wY-Gp <= address@hidden H=eggs.gnu.org 
[140.186.70.92]:54778 I=[140.186.70.17]:25 P=esmtp S=5383 address@hidden 
T="[sr #107667] CLISP: Permission denied" from <address@hidden> for 
address@hidden
 2011-08-10 06:26:53 [3632] cwd=/spool/exim4 3 args: /usr/sbin/exim4 -Mc 
1Qr5zt-0000wY-Gp
 2011-08-10 06:26:53 [3630] SMTP connection from eggs.gnu.org 
[140.186.70.92]:54778 I=[140.186.70.17]:25 
closed by QUIT
 2011-08-10 06:26:53 [3632] 1Qr5zt-0000wY-Gp => savannah-hackers 
<address@hidden> F=<www-
address@hidden> P=<address@hidden> R=take_sa_hint_router T=spam_archive S=5383 
QT=4s 
DT=0s
 2011-08-10 06:26:53 [3632] 1Qr5zt-0000wY-Gp Completed QT=4s

So mailman *never* saw the message. mharc is probably smart enough to notice 
the missing messagid from the 
next reply the thread. This explains the "message not available" lines in the 
archives.

The exim routing proceeds like this:

 # We run spamassassin on the host that feeds mail to lists
 take_sa_hint_router:
  verify = false
  condition = ${if eq{${length_3:$h_X-Spam-Flag:}}{YES} {1} {0}}
  driver = accept
  transport = spam_archive

 spam_archive:
   driver = appendfile
   directory = /spam/$local_part/
   create_directory = true
   maildir_format = true

The lost message was at 
/spam/savannah-hackers/new/1312972013.H459966P3633.lists.gnu.org. The reason 
why 
SpamAssassin marked it as spam is:

X-Spam-Report:
  *  3.3 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL
  *      [XXX.XXX.XXX.XXX listed in zen.spamhaus.org]
  *  0.8 RCVD_IN_SORBS_WEB RBL: SORBS: sender is an abusable web server
  *      [XXX.XXX.XXX.XXX listed in dnsbl.sorbs.net]
  *  1.4 RCVD_IN_BRBL_LASTEXT RBL: RCVD_IN_BRBL_LASTEXT
  *      [XXX.XXX.XXX.XXX listed in bb.barracudacentral.org]
  *  0.6 HS_INDEX_PARAM URI: Link contains a common tracker pattern.
  * -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
  *      [score: 0.0000]
  *  0.8 RDNS_NONE Delivered to internal network by a host with no rDNS
  *  0.0 HELO_NO_DOMAIN Relay reports its domain incorrectly

The user apparently posted the comment from XXX.XXX.XXX.XXX, which seems to 
belong to a dynamic block of an 
Indian ISP and is blacklisted in serveral places.

I'm not sure how we could reduce the amount of miscategorized posts. the 
listhelper mechanism is for posts 
blocked by mailman. The posts blocked by SpamAssassin currently go to 
quarantine maildirs that nobody ever 
looks at. (I'm not suggesting that someone should, it would require a huge 
amount of time).

-- 
Bernie Innocenti
Systems Administrator, Free Software Foundation




reply via email to

[Prev in Thread] Current Thread [Next in Thread]