pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Pan-users] Re: Regex question - Re: ANN: Pan 0.14.2


From: Duncan
Subject: [Pan-users] Re: Regex question - Re: ANN: Pan 0.14.2
Date: Mon, 01 Sep 2003 04:41:40 -0700
User-agent: Pan/0.14.0.95 (Pan Contains 70 Lines of SCO Code)

Wolf J. Flywheel posted <address@hidden>, excerpted
below,  on Sun, 31 Aug 2003 18:40:49 -0400:

> "Can't use regular expression 
> "^.*[[:<:]]free[[:>:]].*$": Invalid character class name".  (Actually, in 
> the score file, it's "FREE".)  Now, that one is supposed to catch the 
> ever-popular spam that screams about "FREE (SEX|WAREZ|FRUIT)..." and the 
> [[:<:]]/[[:>:]] constructs are supposed to indicate word boundaries.  
> Therefore, any subject that contains "blah blah FREE blah" should be 
> scored down a bit.
> 
>       I got my regex information from the Bash documentation... should I be 
> using a different sort of regex?

I've never seen that either, but shell programming != regex programming,
in general.  In most regex engines, [] indicates a character class, while
in BASH, it is another way of writing "test", an internal shell function.

Many regex engines use POSIX character lists, denoted by [::], but
recognized only within character classes, so it's common to see for
instance [[:alnum:]] for a character class consisting solely of
alpha-numeric characters as denoted by the character list within the
character class.  (Note that these character lists are intended to be i18n
aware, making this more language portable than say [a-zA-Z0-9].)  Of
course, one could also use [[:alnum:]-_] for instance, to also include
the dash and underscore characters.

The PAN warning is therefore correct in this case, as there is no POSIX
character list named "<", nor is there one named ">", AFAIK.

As for decent documentation on regexps, a book I use for this and all
sorts of other Linux reference type purposes (incl. BASH programming, and
the various command line options for the normal Linux bestiary of
commands, plus more), is O'Reilly's "Linux in a Nutshell" (aka "The
Arabian", for the illustration on the cover, as is common with O'Reilly
books), which should now be out in 4th edition, I think. (I have the
third, but it's a little long in the tooth for a Linux reference, since
it's from 2000. I've noticed it most in the fact that it deals with LILO
but not GRUB.) Among its appendices is one dealing with regexps.

Any good Perl programming book, incl. O'Reilly's "Perl Programming" (aka
"The Camel") and "Learning Perl" (aka "The Llama"), both of which I have,
should include a good regexp section, as these do.  There's also an
O'Reilly "Mastering Regular Expressions" book that I do NOT have (nor am
I familiar with its mascot handle), but which I may well get at some point.

(Hey.. I'm not spending it on $400-1000+ proprietary-ware programming
suites now, and adding insult to injury, those at least from Monopo$oft
quit shipping with decent "dead tree" documentation some time b4 the turn
of the century, one of the reasons I switched to Linux, as it happens.  If
I'm going to spend that kind of $$, I want some documentation for my $$. 
On Linux, the programming languages are free, as is the online
documentation, but the dead trees still cost $$... which I don't mind
paying, since I get something I can actually USE..)

Online (and free..) try the Qt (KDE) regex doc page, here:
http://doc.trolltech.com/qregexp.html#details.

I'm sure there are others, including PERL regexp ones, but that's a
convenient one.  Also, the various man pages for grep, sed, and etc. may
have some info.

-- 
Duncan - List replies preferred.   No HTML msgs.
"They that can give up essential liberty to obtain a little
temporary safety, deserve neither liberty nor safety." --
Benjamin Franklin






reply via email to

[Prev in Thread] Current Thread [Next in Thread]