bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] dfa: fix case folding logic for character ranges


From: Jim Meyering
Subject: Re: [PATCH] dfa: fix case folding logic for character ranges
Date: Tue, 07 Jun 2011 11:11:09 +0200

Paolo Bonzini wrote:
> * src/dfa.c (setbit_case_fold): Remove, replace with...
> (setbit_wc, setbit_c, setbit_case_fold_c): ... these.
> (parse_bracket_exp): Use setbit_case_fold_c when iterating over
> single-byte sequences.  Use setbit_wc for multi-byte character sets,
> and setbit_case_fold_c for single-byte character sets.
> (lex): Use setbit_case_fold_c for single-byte character sets.
> ---
>       > At first I was going to say this:
>       >
>       >   You are using ru_RU.KOI8-R, which is a uni-byte locale, yet your
>       >   inputs (both stdin and the grep regexp) use the two-byte
>       >   representation \xd0\9f for П, instead of the uni-byte \360.
>       >
>       > But it fails even with the single-byte version.
>       > So it is indeed a bug in grep, but at least this time
>       > it affects relatively few locales.
>       >
>       > Here's the fix I expect to use and a test case to exercise it.
>
>         The bug affects all single-byte locales except ISO-8859-1 ones.
>         It is quite serious---the logic to map wide characters back to
>         bytes makes no sense.
>
>         The attached patch fixes it and does not regress high-bit-range,
>         while removing the debatable uses of wctob and checks for EOF.  Ok
>         to apply together with your testcase?
> ---
>  src/dfa.c |  102 
> ++++++++++++++++++++++++++++++++++---------------------------
>  1 files changed, 57 insertions(+), 45 deletions(-)

Hi Paolo,

Thanks for following through on this.
At first glance (I'll look carefully today)
this looks like the right approach.

However, I've gone ahead and pushed my patch and test case,
since it does solve the problem at hand, and I have not seen
inputs that make that code misbehave.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]