listhelper-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: more debbugs-submit discards


From: Bob Proulx
Subject: Re: more debbugs-submit discards
Date: Wed, 3 Apr 2013 23:47:46 -0600
User-agent: Mutt/1.5.21 (2010-09-15)

Glenn Morris wrote:
> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=14132
Message-Id: <address@hidden>
> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=14133
Message-Id: <address@hidden>
> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=14134
Message-Id: <address@hidden>

Although I couldn't find the previous messages I do find these.
Everything is kept in maildir folders.  I grep through the list to
locate individual messages and then can look at them in more detail.
There are a lot of messages and therefore the grep runs for many
minutes.  Grep'ing for the above message-ids locates these:

  caughtham-mm/new/1365008455.M734284P27777.tedium:Message-Id: <address@hidden>
  caughtham/cur/1365008455.M680826P27773.tedium:2,S:Message-Id: <address@hidden>
  caughtspam-mm/new/1364996400.M495440P19056.tedium:Message-Id: <address@hidden>

  -rw-rw-r-- 1 2741 Apr  3 11:00 
caughtham/cur/1365008455.M680826P27773.tedium:2,S
  -rw-rw-r-- 1 6046 Apr  3 11:00 
caughtham-mm/new/1365008455.M734284P27777.tedium
  -rw-rw-r-- 1 6933 Apr  3 07:39 
caughtspam-mm/new/1364996400.M495440P19056.tedium

The first time it was seen it was classified as spam and would have
had a discard control generated for it.  Then when you vivified the
message it came through and was classified as non-spam.

  caughtspam-mm/new/1365019798.M355529P3401.tedium:Message-ID: <address@hidden>
  -rw-rw-r-- 1 11566 Apr  3 14:09 
caughtspam-mm/new/1365019798.M355529P3401.tedium

  caughtspam-mm/new/1364986345.M713387P8127.tedium:Message-ID: <address@hidden>
  -rw-rw-r-- 1  9297 Apr  3 04:52 
caughtspam-mm/new/1364986345.M713387P8127.tedium

And those were also classified as spam when they came through.  So
same thing there.  They would have had a discard control generated.

All of those are in the raw mailman "caughtspam-mm" folders.  But if
they are in the spam folder then there should be corresponding
"caughtspam" message that is the same message after it ran through
spamassassin.  That would show the spamassassin score and presumably
would say why it was classified as spam.

If you vivified the message then unless you whitelisted the address
first it would have generated a new message and I should have found a
second copy of these latter two like there was for the first one.  I
don't see any which seems odd.

I have been trying to improve the system.  I recently added IP based
blacklisting for the very serious spam sources.  Did an IP match?  I
checked and no there were no IP based matches.  I currently have only
eight IPs from the most prolific spammers in the IP blacklist.  None
of those matched any of these three.

I also made some other small tweaks recently related to weighting the
BAYES_9[59] higher and if so then accepting a lower overall spam
score down to 4.5.  But when I run those message through SA now I see this:

Message-Id: <address@hidden>
  Content analysis details:   (-1.9 points, 5.0 required)
   pts rule name              description
  ---- ---------------------- --------------------------------------------------
   0.0 RP_MATCHES_RCVD        Envelope sender domain matches handover relay 
domain
  -1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%

Message-Id: <address@hidden>
  Content analysis details:   (-1.9 points, 5.0 required)
   pts rule name              description
  ---- ---------------------- --------------------------------------------------
  -0.0 RCVD_IN_DNSWL_HI       RBL: Sender listed at http://www.dnswl.org/, high
                              trust
                              [208.118.235.10 listed in list.dnswl.org]
   0.0 RP_MATCHES_RCVD        Envelope sender domain matches handover relay 
domain
  -1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%


Message-Id: <address@hidden>
  Content analysis details:   (-1.9 points, 5.0 required)
   pts rule name              description
  ---- ---------------------- --------------------------------------------------
  -0.0 RCVD_IN_DNSWL_HI       RBL: Sender listed at http://www.dnswl.org/, high
                              trust
                              [208.118.235.10 listed in list.dnswl.org]
   0.0 RP_MATCHES_RCVD        Envelope sender domain matches handover relay 
domain
  -1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
                              [score: 0.0000]
   0.0 MIME_QP_LONG_LINE      RAW: Quoted-printable line longer than 76 chars

I imagine the current reason for the BAYES_00 is that listhelper is
probably subscribed to the mailing list and therefore has later
learned that those messages are nonspam.  But importantly there
weren't any other bad things seen.  And those message seem so plain
that it is hard to see how they would have scored very high from the
BAYES test anyway.

Which leaves me with a big I-don't-know.

I have been making modifications.  There is always ongoing tinkering
in order to chase the continously changing spam.  I will revert to
last week's copy for the moment and then look over the new changes
extra carefully.  This is a new problem and the only changes have been
my recent tweaks so Occam's Razor says it must be in the recent
changes somewhere.

The fact that they are in the caughtspam-mm folder but not in the
caughtspam folder shows me that something on my end is broken.
Because that shouldn't be happening.  Those should always be in
lockstep with each other.

Sorry for the breakage.

Bob



reply via email to

[Prev in Thread] Current Thread [Next in Thread]