pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Pan-users] Re: Filtering out the flood at sci.crypt


From: Duncan
Subject: [Pan-users] Re: Filtering out the flood at sci.crypt
Date: Mon, 25 Jun 2007 16:26:41 +0000 (UTC)
User-agent: Pan/0.131 (Ghosts: First Variation)

JCA     <address@hidden> posted
address@hidden, excerpted
below, on  Mon, 25 Jun 2007 06:50:26 -0700:

JCA <address@hidden> posted
address@hidden, excerpted
below, on  Mon, 25 Jun 2007 06:50:26 -0700:

>      For the last few weeks some idiot has taken to flooding sci.crypt
> (and possibly other groups) with junk. The postings are spoofed to
> appear as coming from regulars in the group, and the contents of the
> postings are just random drivel.
> 
>     Anybody know a rule, or set of rules, to filter them out? It would
> appear that the bogus postings all come from a specific news provider -
> things like
> 
>    news.highwinds-media.com!hw-filter.lga!newsfe04.lga.POSTED!53ab2750
> 
> but I don't know how to filter this out.

<mode=rant>

This is one reason I've pushed for a long time to have scoring/filtering
(since before pan had scoring, when it was all binary decision filtering,
that's how long) that could match anywhere in the post, in the body, or in
headers not in the overviews.  The problem is, the stuff in the overviews
can generally be entirely controlled by the poster, so if they want to be
deliberately disruptive and therefore deliberately and continuously modify
this info, in ordered to evade scoring systems like pan's, unfortunately,
there's not a lot that the poor users of such clients can do.

The problem is, in ordered to score/filter on things not in the overviews,
the post must be downloaded first.  For better or for worse, Charles'
position has always seemed to emphasize scoring in ordered to choose
/what/ to download (and/or what to delete without downloading), simply
trusting that the overview data used to make such decisions isn't going to
be deliberately obfuscated, in ordered to prevent such scoring/ filters
from working.

My position, OTOH, is that while it's a bonus if a useful score can be
used to ignore (ultimately, to kill/delete) or watch (ultimately, to auto-
download or at least mark for download) before downloading, just because
the post must be downloaded first doesn't mean the war is already lost. It
still takes time to view the message, and if automated tools (scoring/
filtering) can be used to either prioritize the viewing (in the case of
watch or positive scores), or to allow mark-read or deletion without
actual viewing (in the case of ignore or negative scores), well, the war
is still won, tho admittedly not as easily.

Unfortunately, while I'd have much rather had effective filtering based on
/anything/ in the message, than scoring still restricted to overview data
only, and while I've been a very active volunteer here on the pan
lists/groups, it seems your problem and mine don't appear to hit enough
people to be very high on the priority list.

Back years ago, when I originally filed the request, Charles stated that
yes, he agreed that sort of thing would be useful.  However, it was for
him pretty much in the "nice to have at some point" category, and thus was
"blueskied" (aka "backburnered") into never-never-land.

BTW, even the official slrn scorefile documentation, (slrn's scorefile 
format is what pan uses) says non-overview headers can be matched, tho it 
goes to pains to point out that it's less efficient since the posts must 
be downloaded before those scores will match.

Of course, Charles has always been quite open to patches, and I've little
doubt if someone with the skills had submitted a patch to implement this
functionality, we'd not be talking about it now as it'd work as well as
overview scoring does.  Unfortunately, that's not a set of skills I have,
and no one else has seemed to have the itch to scratch, so the
functionality remains "bluesky", nice to have "someday".

OTOH, the very fact that I'm still here means regardless of whether this
particular feature I'd sure like has been instituted or not, pan continues
to work better for me than the alternatives, so I guess I can't complain
to strenuously.

</mode=rant>

Meanwhile, despite the fact that we're left fighting with the equivalent
of our hands tied behind our backs, there's still a slight chance you can
find something useful to match.  I assume you've already found nothing
useful to match in the subject or author headers, and date, group, line-
count, xref, etc, are too generic to be useful.

That leaves one remaining possibility, the message-ID.  If you are lucky 
and this guy isn't an expert at this yet, the message-ID header, which 
*IS* part of the overview headers, will contain something identifying 
that can be scored on, hopefully without matching a bunch of other posts 
in the process.

Message-ID is (or is supposed to be) unique for each post, so you'll have 
to use contains or regex expression type matching.  You'll also have to 
hand-edit the score in your scorefile, altho you can get it most of the 
way there using pan's GUI.  Of course, you first have to see if there's 
part of the message-ID that's uniquely his, but matches all his 
messages.  Turn view headers on and check that header in several of his 
messages.  You will likely want to compare those of other regulars as 
well, just to be sure you won't over-match.  If you find something useful 
to match, select one of his messages and add a score on it, based on the 
References header, which pan will auto-fill-out with the message-ID.  
You'll need to edit out the part that changes, of course.  Once you have 
it setup, add the score (without rescore), but keep open the view scores 
dialog.  Then load the scorefile in your favorite text editor and find 
the score (should be at the end).  Edit the References line, changing it 
to Message-ID.  Save the file, and back in pan, NOW hit the close and 
rescore in the view article's score window.  If you got it right, that 
should do it, and won't match anyone else's real posts.

As I said tho, the good attackers won't overlook message-ID and will 
already set it so his provider won't, and you'll have no reliable way to 
score his posts.  The best attackers won't just fake the message-ID, 
they'll make it look like the one the regular author they are faking 
uses, so matching it will unfortunately match the regular author's posts 
as well.

BTW, that highwinds-media entry looks familiar.  My ISP (Cox) outsources 
from them, so all Cox users get that stamp.  If it's a Cox user, however, 
not some other non-cox user of the same server, a number of other 
headings will show up as well, including an unencrypted NNTP-posting-
host, an X-Complaints-To header listing address@hidden, and an X-Trace 
header listing the same user IP as the NNTP-Posting-Host and the same 
server as the posted entry.  If it doesn't have those elements, it's 
probably not a Cox user, anyway.  Unfortunately, none of those headers 
normally appear in the overviews, so pan can't properly score against 
them. =8^(

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman





reply via email to

[Prev in Thread] Current Thread [Next in Thread]