[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Savannah-hackers-public] [gnu.org #705563] many messages missing from m
From: |
Bernie Innocenti via RT |
Subject: |
[Savannah-hackers-public] [gnu.org #705563] many messages missing from mail archives |
Date: |
Mon, 22 Aug 2011 16:04:33 -0400 |
> [karl - Sun Aug 21 21:01:41 2011]:
>
> Hello sysadmins,
>
> Unfortunately, it seems that messages are occasionally going missing
> from the archives on lists.gnu.org, even though they are being
> delivered.
>
> Here is the example I have the most data for:
> on Wed 10 Aug 2011 03:28:00 AM PDT, Shailesh posted comment #8 on a
> savannah ticket, https://savannah.gnu.org/support/?107667#comment8.
> This was received by me (and others) in email -- I will attach the
> text. It was Message-Id: <20110810-
> address@hidden>.
>
> However, looking at the thread index:
> https://lists.gnu.org/archive/html/savannah-hackers/2011-
> 08/threads.html
> it does not appear (the thread, "CLISP: Permission denied"), is about
> halfway down the page. All other comments in the thread are there,
> including several from Shailesh.
>
> Looking at the date index, for August 10:
> https://lists.gnu.org/archive/html/savannah-hackers/2011-08/index.html
> Shailesh's message is also not there. (The message from him on August
> 11 is the next one, comment #9.)
>
> Bizarrely, it apparently did not reach mail-archive.com, either.
> It would be the next message from comment #7,
> http://www.mail-archive.com/address@hidden/msg17425.html
> but it jumps to comment #9, like our thread index, even though
> address@hidden is subscribed to savannah-hackers.
> I don't get that.
>
> It is also not in the mbox archive,
> /var/lib/mailman/archives/private/savannah-hackers.mbox.
>
> Shailesh reposted his comment exactly, as comment #11, to test if it
> would reach the archives this time. It did. So, as one might guess,
> it
> is not about the content but about something happening at the time of
> the mail processing.
>
> Now, the worst part: a Google search shows thousands of missing
> messages, past and present, even discounting google's overcounting of
> results and the likelihood that some of them are just threading
> computations going wrong.
> http://www.google.com/search?
hl=en&safe=off&q=site%3Alists.gnu.org+threads.html+"message+not+available"&oq=site%3Alists.gnu.org+threads.
html+"message+not+available"&aq=f&aqi=&aql=&gs_sm=e&gs_upl=19750l21000l0l21202l13l9l0l0l0l6l222l1231l2.6.1l
9l0
>
> Help?
It took me a while because the logs for 20110810 had already been rotated, but
I finally figured out what
happened: the post had been marked as spam on eggs (note the
take_sa_hint_router) and has been ditched:
2011-08-10 06:26:53 [3630] 1Qr5zt-0000wY-Gp <= address@hidden H=eggs.gnu.org
[140.186.70.92]:54778 I=[140.186.70.17]:25 P=esmtp S=5383 address@hidden
T="[sr #107667] CLISP: Permission denied" from <address@hidden> for
address@hidden
2011-08-10 06:26:53 [3632] cwd=/spool/exim4 3 args: /usr/sbin/exim4 -Mc
1Qr5zt-0000wY-Gp
2011-08-10 06:26:53 [3630] SMTP connection from eggs.gnu.org
[140.186.70.92]:54778 I=[140.186.70.17]:25
closed by QUIT
2011-08-10 06:26:53 [3632] 1Qr5zt-0000wY-Gp => savannah-hackers
<address@hidden> F=<www-
address@hidden> P=<address@hidden> R=take_sa_hint_router T=spam_archive S=5383
QT=4s
DT=0s
2011-08-10 06:26:53 [3632] 1Qr5zt-0000wY-Gp Completed QT=4s
So mailman *never* saw the message. mharc is probably smart enough to notice
the missing messagid from the
next reply the thread. This explains the "message not available" lines in the
archives.
The exim routing proceeds like this:
# We run spamassassin on the host that feeds mail to lists
take_sa_hint_router:
verify = false
condition = ${if eq{${length_3:$h_X-Spam-Flag:}}{YES} {1} {0}}
driver = accept
transport = spam_archive
spam_archive:
driver = appendfile
directory = /spam/$local_part/
create_directory = true
maildir_format = true
The lost message was at
/spam/savannah-hackers/new/1312972013.H459966P3633.lists.gnu.org. The reason
why
SpamAssassin marked it as spam is:
X-Spam-Report:
* 3.3 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL
* [XXX.XXX.XXX.XXX listed in zen.spamhaus.org]
* 0.8 RCVD_IN_SORBS_WEB RBL: SORBS: sender is an abusable web server
* [XXX.XXX.XXX.XXX listed in dnsbl.sorbs.net]
* 1.4 RCVD_IN_BRBL_LASTEXT RBL: RCVD_IN_BRBL_LASTEXT
* [XXX.XXX.XXX.XXX listed in bb.barracudacentral.org]
* 0.6 HS_INDEX_PARAM URI: Link contains a common tracker pattern.
* -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
* [score: 0.0000]
* 0.8 RDNS_NONE Delivered to internal network by a host with no rDNS
* 0.0 HELO_NO_DOMAIN Relay reports its domain incorrectly
The user apparently posted the comment from XXX.XXX.XXX.XXX, which seems to
belong to a dynamic block of an
Indian ISP and is blacklisted in serveral places.
I'm not sure how we could reduce the amount of miscategorized posts. the
listhelper mechanism is for posts
blocked by mailman. The posts blocked by SpamAssassin currently go to
quarantine maildirs that nobody ever
looks at. (I'm not suggesting that someone should, it would require a huge
amount of time).
--
Bernie Innocenti
Systems Administrator, Free Software Foundation