bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Case insensitivity seems to ignore lower bound of interval


From: Davide Brini
Subject: Re: Case insensitivity seems to ignore lower bound of interval
Date: Thu, 28 Apr 2011 14:27:55 +0100
User-agent:

On Thu, 28 Apr 2011 15:12:14 +0200 Eric Bischoff <address@hidden>
wrote:

> Le jeudi 28 avril 2011 14:04:28, Davide Brini a écrit :
> > But you got me curious. My original test system used mostly vanilla
> > tools built from source, so I thought I would try on stock distros
> > instead. And guess what, both on a standard RHEL 6 and Debian squeeze,
> > I see your results (ie gawk behaves differently).
> 
> OK, that's interesting information. That means that either those
> distributions patch the tools, either there's some compilation option
> that differs.
> 
> I have tested 6 stock distros too, all with the same behaviour forgawk (I
> have not tested  sed nor grep  yet).

I have done some further investigation using sed (which seems to be the
most controversial one). It turns out that it can be compiled to use an
internal RE library, or the libc-provided one.
There is a --{with,without}-included-regex switch to the configure script
to explicitly specify this. 

On my system, when building from source without options, for some reason
(not investigated) it fails to detect a good external RE library and thus it
builds using the internal one, which exhibits the behavior I originally
saw (now to find out why it is so is another matter, but not relevant here).

I suppose that most distros instead build with --without-included-regex (a
quick look at the RHEL SRPM seems to confirm that).

There may be similar issues at play for awk and grep (not checked).
 
> > something different. Arnold is the authoritative source here.
> 
> Yes, and let me take the occasion to thank him for all the great work on
> gawk and the nice documentation.

+1.

> That does not make it more logical :-), and sed is definitely different:
> $ echo 'ijklmnopqs' | sed '/[R-Z]/p'
> ijklmnopqs
> $ echo 'ijklmnopqr' | sed '/[R-Z]/p'
> ijklmnopqr
> $ echo 'ijklmnopqR' | sed '/[R-Z]/p'
> ijklmnopqR
> ijklmnopqR
> $ echo 'ijklmnopqS' | sed '/[R-Z]/p'
> ijklmnopqS
> ijklmnopqS

Yes, while I'm not myself a fan of the non-C locale behavior, I think that
all tools should at least behave consistently for a given locale. Probably
there is a good deal of historical reasons behind these discrepancies.

-- 
D.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]