bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/9] dfa: fix handling of ranges in multibyte character sets


From: Jim Meyering
Subject: Re: [PATCH 2/9] dfa: fix handling of ranges in multibyte character sets
Date: Mon, 15 Mar 2010 12:11:31 +0100

Paolo Bonzini wrote:
>> Hi Paolo,
>>
>> Do you have a test that exercises this fix?
>> As far as I can see, the above tests currently succeed
>> with grep built from master.  I expected them to fail.
>
> It passes because strcoll (and hences ranges) is case-insensitive in
> many locales:
>
> $ printf '1\ny\n.\n' | LC_ALL=en_US.UTF-8 grep '[A-Z]'
> y
>
> (Note no -i).  It would fail in something like C.UTF-8, but it is not
> portable and as far as I know it only works under Cygwin---not even
> glibc supports it:
>
> $ LC_ALL=C.UTF-8 bash
> bash: warning: setlocale: LC_ALL: cannot change locale (C.UTF-8)
>
> So I included the test more for completeness than anything else,
> hoping that we get coverage on a system where strcoll is case
> sensitive.

Well, I would really like a test that passes with,
and fails without, that fix, so how about using something like this:

This shows that grep-2.5.3 gets it wrong:

    $ printf '%s\n' A Z | LC_ALL=en_US.UTF-8 grep -i '[a-z]'
    A

and with your fix, grep -i does what we would expect:

    $ printf '%s\n' A Z | LC_ALL=en_US.UTF-8 src/grep -i '[a-z]'
    A
    Z




reply via email to

[Prev in Thread] Current Thread [Next in Thread]