[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 2/9] dfa: fix handling of ranges in multibyte character sets
From: |
Jim Meyering |
Subject: |
Re: [PATCH 2/9] dfa: fix handling of ranges in multibyte character sets |
Date: |
Mon, 15 Mar 2010 15:00:39 +0100 |
Paolo Bonzini wrote:
>> Well, I would really like a test that passes with,
>> and fails without, that fix, so how about using something like this:
>>
>> This shows that grep-2.5.3 gets it wrong:
>>
>> $ printf '%s\n' A Z | LC_ALL=en_US.UTF-8 grep -i '[a-z]'
>> A
>>
>> and with your fix, grep -i does what we would expect:
>>
>> $ printf '%s\n' A Z | LC_ALL=en_US.UTF-8 src/grep -i '[a-z]'
>> A
>> Z
>
> Great, I'll squash this in:
>
> diff --git a/tests/case-fold-char-range b/tests/case-fold-char-range
> index e683da9..9b3120f 100644
> --- a/tests/case-fold-char-range
> +++ b/tests/case-fold-char-range
> @@ -3,18 +3,19 @@
> : ${srcdir=.}
> . "$srcdir/init.sh"; path_prepend_ ../src
>
> -printf 'Y\n' > exp1 || framework_failure
> +printf 'A\nZ\n' > exp1 || framework_failure
> fail=0
>
> for LOC in en_US.UTF-8 zh_CN $LOCALE_FR_UTF8; do
> - printf '1\nY\n.\n' | LC_ALL=$LOC grep -i '[a-z]' > out1 || fail=1
> + printf 'A\n1\nZ\n.\n' | LC_ALL=$LOC grep -i '[a-z]' > out1 || fail=1
> compare out1 exp1 || fail=1
> done
>
> -printf 'y\n' > exp2 || framework_failure
> +# This actually passes also for grep-2.5.3
> +printf 'a\nz\n' > exp2 || framework_failure
>
> for LOC in en_US.UTF-8 zh_CN $LOCALE_FR_UTF8; do
> - printf '1\ny\n.\n' | LC_ALL=$LOC grep -i '[A-Z]' > out2 || fail=1
> + printf 'a\n1\nz\n.\n' | LC_ALL=$LOC grep -i '[A-Z]' > out2 || fail=1
> compare out2 exp2 || fail=1
> done
>
> (tested to fail before and pass after my patch)
Perfect.
Please add a comment something like this just before
your changed lines in dfa.c:
/* Map a case-folded range, say [m-z] (or even [M-z]) to the
pair of ranges, [m-z] [M-Z]. */
Then, this one is good to go.
[PATCH 3/9] dfa: rewrite handling of multibyte case_fold lexing, Paolo Bonzini, 2010/03/14
[PATCH 4/9] dfa: speed up handling of brackets, Paolo Bonzini, 2010/03/14
[PATCH 5/9] dfa: optimize simple character sets under UTF-8 charsets, Paolo Bonzini, 2010/03/14