[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: grep : problem with locale
From: |
John Cowan |
Subject: |
Re: grep : problem with locale |
Date: |
Thu, 20 Apr 2006 20:12:55 -0400 |
User-agent: |
Mutt/1.3.28i |
Sylvain scripsit:
> There is a little problem with the man page :
> Finally, certain named classes of characters are predefined within bracket
> expressions, as follows. Their names are self explanatory, and they are
> [:alnum:], [:alpha:], [:cntrl:], [:digit:], [:graph:], [:lower:],
> [:print:], [:punct:], [:space:], [:upper:], and [:xdigit:]. For example,
> [[:alnum:]] means [0-9A-Za-z], except the latter form depends upon the C
> locale and the ASCII character encoding, whereas the former is independent
> of locale and character set. (Note that the brackets in these class names
> are part of the symbolic names, and must be included in addition to the
> brackets delimiting the bracket list.) Most metacharacters lose their
> special meaning inside lists. To include a literal ] place it first in the
> list. Similarly, to include a literal ^ place it anywhere but first.
> Finally, to include a literal - place it last.
I agree that the wording is confusing. What's meant is that the form
[a-zA-Z] will match a letter only if by "letter" you mean "English
letter" and that your encoding is ASCII-compatible (it matches too
much on an EBCDIC system). [:alpha:] on the other hand will match
whatever counts as a letter on the local system and will be independent
of character encoding. In that sense, then, using [:alpha:] is locale-
and encoding-independent assuming that what you want is to match a letter,
whereas [A-Za-z] is neither.
That said, IMHO this whole business of locale-dependent letters is folly.
The letter e-acute is just as much a letter in the English word "resumé"
as in the French word "résumé"; for that matter, thorn is a letter in
both English and French contexts, though neither English nor French uses
it -- what else could it possibly be?
--
After fixing the Y2K bug in an application: John Cowan
WELCOME TO <censored> address@hidden
DATE: MONDAK, JANUARK 1, 1900 http://www.ccil.org/~cowan