pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Pan-users] Composing regex for Pan


From: Michael R. McCarrey
Subject: RE: [Pan-users] Composing regex for Pan
Date: Mon, 15 Mar 2004 11:43:27 -0800

On Sun, 2004-03-14 at 06:01, Paul Hudson wrote:
> > > 
> > >  \b[:upper:]{2,}\b
> > This dumped all replies. The regex animal book doesn't 
> > explain those constructs very well (nor have any of the web 
> > sites I've looked at). 
> 
> Have a look at the link I sent - all the info's in there somewhere, I think
> :)
> 
> > > http://www.pcre.org/pcre.txt).
Yes, it's somewhere alright <g> I've a;ready noticed some interesting
elements which may apply. Sure won't hurt to try them out.
> 
> 
> > > (?-i)\b[A-Z]{2,}\b
> > This works, sort-of, if I select NONE OF:, but things like 
> > "!?&" in the string break it.
> 
> (All the below untested as before)
> 
> So, I'm unclear what you want. How about keeping things with at last one
> word with at least one lower case letter in the middle of it?
> 
> (?-i)\b.+[a-z].+\b
This logs an error: Can't use regex "(?-i)\b.+[a-z].+\b": Invalid
preceding regex.
> 
> > What I've been reading says that the ? refers to "zero or more times"
> > (this must be my "snake & necklace" problem again).
> 
> It's the ( followed by ? that is important here - you're correct that ? In
> other contexts means zero or more
> > 
> > I want to dump as many of the annoying spam, troll and 
> > AOL-keyboard posts as I can, which I think, will require 
> > parsing the string's individual characters, multiple times 
> > (maybe my approach is flawed?) Once for ALL CAPS (if true, 
> > dump the post, regardless of additional characters in the 
> > string).
> 
> So dump lines that match
> 
> (?-i)[a-z]
> 
> maybe (don't contain at least one lower case character)
This also logs an error: Can't use regex "(?-i)[a-z]": Invalid preceding
regex. Could this be caused by the condition I set in Pan (NONE OF:)? I
think so as changing the condition to ANY OF: or ALL OF: does not log an
error. This bites me often.
> 
> >After that, it gets interesting. Now we should have 
> > mixed-case alpha and/or alpha-numeric (or "should" have).
> 
> So, don't do anything with these (leave them with the default score which
> means they'll be shown)
Before they reach the point of being displayed, I want to check those
results and further qualify them.

> 
> > Next, filter on multiple instances (2 or more to start) of 
> > any non-alpha, printable characters, anywhere in the string. 
> 
> Do you mean the same charact repeated? This one's interesting. I think we
> can use backreferences here....
> 
> Keep lines that don't match
> 
> [:punct:]\1
Is this like recursion or repeatedly calling a subroutine until a
specified condition is met or one has run out of options?

> 
> > Dump the matches. Then filter those results against any other 
> > specific criteria until what remains are subjects that look 
> > "normal" as in: Just a test post | Just A Test Post | Just a 
> > Test Post #10 | any of the previous, prefixed by "Re:", ect.
> 
> These should be straightforward?
> 
> What are you setting the score to for each of these?
Presently, all scoring is default, as are the rules. I wanted to get a
functioning set of filters before I started messing around with scoring
and the rules.

> 
> ---
> Outgoing mail is certified Virus Free.
> Checked by AVG anti-virus system (http://www.grisoft.com).
> Version: 6.0.572 / Virus Database: 362 - Release Date: 27/01/2004
>  
> 
> 
> 
> _______________________________________________
> Pan-users mailing list
> address@hidden
> http://mail.nongnu.org/mailman/listinfo/pan-users





reply via email to

[Prev in Thread] Current Thread [Next in Thread]