pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Pan-users] Re: Tutorial on Scoring?


From: Duncan
Subject: [Pan-users] Re: Tutorial on Scoring?
Date: Wed, 06 Oct 2004 16:05:29 -0700
User-agent: Pan/0.14.2.91 (As She Crawled Across the Table)

Maurice Batey posted <address@hidden>,
excerpted below,  on Wed, 06 Oct 2004 21:39:29 +0100:

> Is there a file somewhere that is essentially a tutorial on how to use Pan's 
> scoring facility (e.g. to ignore certain threads, highlight others in 
> colour)?

None for PAN specifically, altho it uses a fairly standard scoring file
format so documents for the others with the same format should work (don't
remember /which/ others ATM, unfortunately, so can't refer you directly...).

However, I had never done scoring before and found PAN's method pretty
self-evident.  It really doesn't need a tutorial, from my viewpoint. 
Again, note that PAN's the first I've done scoring, so it's not like I
brought my skills from elsewhere, either.

The general idea is that scores allow a "fuzzy logic" approach where
various factors can add to or subtract from a score, thus influencing
whether it's marked as "important" (watched) to you, simply "interesting"
(positive score but not "watched"), highly interesting (higher positive
score but still not high enough to be "watched", normal (zero, no score
modifiers), in general likely to be more noise than signal but still might
have an occasional interesting comment (negative score elements like an
author or a subject you'd normally skip but where you might want positive
score elements like a favorite author posting to a dumb subject thread, or
an author you usually skip but on an interesting thread, such that
positives might be allowed to bring the score up to where you see the
post), or ignored (authors or subjects you don't care to see in any case).
That's far more complex than the simpler ignored/normal/watched that
non-scoring filters generally permit.

PAN also allows one to change the colors for the various score zones,
under preferences, and to create filters based on score zone, and then use
them in rules, which can then be set up to automatically download posts
for watched overviews, for instance, or to automatically delete ignored
posts (as I do).  You can also use a rule to set watched or ignored, altho
scoring is more flexible.

There are at least two ways you can use the scoring system to your
advantage.  One would be to simply set up the scores to color-code into
various categories, not using them for hiding or deleting posts, or
setting them watch, per se, at all, just for color coding.  Back when I
was on the Mandrake Cooker list (I've since switched to Gentoo), I used to
do this, coding discussion of "system" elements one color, while
discussion of packages like KDE that I use but that weren't system
essential, were a different color.  I did use ignore on the set of
packages (most server app packages, for instance, since my computer is a
desktop/workstation, not a server) I didn't use at all.  For this use, the
"set score to" function in the score dialog is best.  Then, you just pick
a number of categories <= the number of color code options you have, and
note the score zones as listed in the colors settings, and set your scores
for items in each category so it shows up the desired color.

The other way is the traditional use of scoring.  That is, to allow you to
for instance mark a person as normally not worth reading, so a negative
score, but not entirely ignored, so if for instance a subject comes up you
are interested in, you get to see what he says as well if you mark the
subject a higher positive than you did the author negative.  This of
course assumes you have PAN's display filters set to hide negative scoring
articles.  You can of course do the inverse, as well, marking subject
keywords you don't normally follow negative, while marking favorite
authors enough positive to more than offset the negative of the subject.

You also still have the kill and watch options (-99999 and +99999
respectively), and can use them as usual.

Some hints for PAN, rules and filters:

You may find it helpful to set rules to delete ignored posts automatically
at d/l.  Likewise, it may be helpful to set a rule to mark posts scored
negative, as read, particularly if you don't display them by default. 
Otherwise, you'll read everything listed and still have a number of posts
listed as unread in the groups pane and title bar, because they are not
displayed because the score is below zero.  

The above makes a distinction between ignored (auto-deleted) posts and
simply scored negative (auto-marked-read) posts.  The idea here is that
you may have some folks you /never/ want to see what they wrote (or
subjects you never want to follow), which you can set ignored, while
others you /normally/ don't want to see, but if something else increases
the score above zero you might consider it worth reading.  As well, if
someone quotes an auto-deleted post and you want to go back and get
context, you have to turn off the auto-delete rule and redownload
overviews (inaccurately called headers) to get the post that was deleted. 
If you just auto-mark-read, you can simply set the filters to show posts
marked negative, and posts marked read, and the overview will be there for
you to fetch and read without the additional hassle of redownloading all
overviews to get it back because it was deleted!

Also, note that if you use the auto-delete rule, and have it apply to
incoming, it will, but that's NOT the same as having it apply immediately
when you create a new ignore score.  At that point you can hit the
add-and-rescore button in the score dialog and it'll hide other messages
that match the ignore score, if that's what you have set.  However it will
NOT delete them until you manually invoke the rule.

Thus, we come to setting up those filters and rules.  The first thing that
isn't quite intuitive to note here is how to get a "less than" filter,
since the only option listed is "at least".  The easiest way to explain
this is with an example.  To get a filter matching "ignored" only, set "at
least low", THEN  HIT THE ADD LINE BUTTON (sometimes I forget this step,
just hitting OK, and then wonder why my filter doesn't work).  The
condition will then be added to the list of conditions above, but it'll
still be the wrong sense.  Now, select that line and hit the "Invert"
button, and NOW it'll change to the "ignored" we originally wanted.  Add
other conditions if desired (none in our example, since it's just an
"ignored" filter), and set a useful name ("Ignored", for our example),
then OK.  You should now have a filter named "Ignored" that will select
posts that are, what else, "ignored"!?

Now you can create that rule I suggested above, to delete "ignored" posts.
You'll probably want it to apply to all groups, so ensure that's selected
on the newsgroups tab of the create rule dialog.  On the filter tab,
select the "ignored" filter we just created.  On the actions tab, select
delete article.  As with the filter, a good name for this rule is simply
"Ignored".  Don't forget to check the apply to incoming box unless you
only want it to apply when invoked manually.

You may now wish to create a "Low score" or "Neg. Score" filter and rule,
that marks those posts as read, as they come in.  Of course, if desired,
you can simply have all low AND ignored posts deleted, but as I mentioned,
I at least find it useful to make the distinction and simply mark
negatively scored posts as read.

Again, when creating new scores, don't forget that these rules will NOT be
invoked automatically on posts already downloaded.  You will need to load
the rules dialog, select the rule in question, and apply it to either your
selected group(s) or all groups, as desired, for the auto-deletions to
occur with new scores on existing downloaded material.

Hints on the score dialog:

**BE SURE TO SET AN OPTION IN EACH CATEGORY!**  How often I have failed to
set one or another option as desired, and ended up with a score that
didn't do what I intended!

In particular, by default, the score will /only/ apply to the single group
you are currently in.   At least here, I usually want it to apply to
several or all groups.  Thus, for example, setting a filter on all
alt.binaries.* groups, I select the starts with, and delete the rest of
the group name.  *DO NOT FORGET TO SET THE  STARTS WITH IN THIS CASE, OR
IT WILL LIKELY MATCH **NO** GROUPS!!*  (That's because all that's left is
alt.binaries, which isn't a literal official group by itself.)

If you want to set **ALL** groups, set to regular expression, and use the
".*" (w/o the quotes) expression, which means any character (the dot means
this) any number of times (0 or more, the * means this).    Another simple
way is to simply set "contains", and a single dot, since all groups in
practice (well, except perhaps some privates groups on private servers)
have the . in them somewhere.  If you don't know how to work with regular
expressions, you can often figure out a way to do close to what you want,
but you are seriously limited in your advanced matching functionality. 
Thus, I'd recommend learning them.

Also, while I guess adding 100 to the score is a decent default, in case
someone screws up, I far more often invoke the score dialog to ignore
a partial subject or author.  A couple time's I've forgotten to set
"ignore", and wondered why my ignore wasn't working.  Likewise, the 30 day
expiration is a reasonably safe default, but if I'm going to the bother, I
usually want 90 days minimum, and will often killfile someone for six
months.  Of course, a lot of spam gets "never expire".

Again, note that keyword scoring is generally more effective than specific
subject or author scoring, particularly when it's spam you are ignoring. 
Just be reasonably sure you aren't setting something bigger than you
intended.  An example from privoxy, my web filter app (thus not PAN)
should help here.  A common web ad filter filters on the keyword "ads",
but note that without modifiers, that will also get "adsl" and
"downloads", among other things, filtering them too, and that's likely
/not/ desired.

Also worth noting is that all PAN's filtering and scoring is based on case
insensitive matching.  Thus, ADS and ads and Ads and aDs all look the same
to PAN.

One of the frustrations with PAN at this point, is the SCOPE of the
filtering and scoring.  Unfortunately, it's limited to a small subset of
the headers (basically, most of those found in a standard overview). 
Thus, one cannot (unfortunately) score on for instance the
NNTP-Posting-Host: header, as it's not in the overview, nor can one score
on words or phrases occurring within the message body or the entire post. 
This is a MAJOR limitation.  It'll likely eventually go away, but don't
expect that any time in the near future, as PAN development seems to be
virtually frozen at present (the last beta release was January, and
development had been slow for another six months before that, at least).

Editing the score file itself:

If you make a mistake adding a score or simply are frustrated with the
score dialog, you'll find editing the score file directly to be of use. 
Again, it should be fairly self-evident, particularly if you are familiar
with regular expressions and how config files tend to be laid out in
general.

Probably the biggest thing to remember is that comments start with a %
symbol.  PAN adds a lot of "cruft" to the file in terms of comments, which
can all be safely deleted.  I do that here, simply to keep it manageable,
as removing all that generally more than halves score file size!  Once you
erase (either in your head or in the file) all those comments, the entry
layout should be rather evident and far less confusing.  [] begins an
entry and encloses the group expression, which is listed in terms of a
regular expression, no matter /how/ you put it in, in the score dialog. 
Again, know regular expressions, and this shouldn't be an issue at all. 
Don't, and ... well ... learn them.  <g>

That's followed by the score line, = sets equal, +/- adds or subtracts
from the existing score to yield a relative value.  That's how scoring
does it's "fuzzy logic".

Next is the Expires line, if there.  (PAN includes it even if not
expiring, as it includes all sorts of other extraneous stuff, but it's
commented.)

Next is the actual matching action.  Again, PAN includes all sorts of
info, all commented except the (usually) single active match, which is
again a regular expression.  Note that it's possible to do multi-match if
desired.  You may edit, comment, or uncomment each match, as needed, to
achieve your desired ends.

Here, after removing all the extraneous comments and blank lines, that's
all there is to most of my entries.  Three simple lines, four if there's
an expiration, then a blank line, then the next entry.

If desired or you think you may screw up your edits, save a backup copy of
the score file.  However, the format is simple enough.  With a proper
understanding of regular expressions, editing mistakes should be seldom
enough, and easy enough to find and correct if a mistake IS made, that a
backup copy shouldn't really be necessary except for those obsessed with
keeping backups of their data at all times.

-- 
Duncan - List replies preferred.   No HTML msgs.
"They that can give up essential liberty to obtain a little
temporary safety, deserve neither liberty nor safety." --
Benjamin Franklin






reply via email to

[Prev in Thread] Current Thread [Next in Thread]