bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/4] dfa: correct handling of single-byte character ranges


From: Jim Meyering
Subject: Re: [PATCH 2/4] dfa: correct handling of single-byte character ranges
Date: Tue, 07 Jun 2011 13:24:05 +0200

Paolo Bonzini wrote:
> This provides a better fix for the unibyte-bracket-expr and high-bit-range
> testcases, and fixes the latent bug tested by bogus-wctob.
>
> * src/dfa.c (setbit_case_fold): Remove, replace with...
> (setbit_wc, setbit_c, setbit_case_fold_c): ... these.
> (parse_bracket_exp): Use setbit_case_fold_c when iterating over
> single-byte sequences.  Use setbit_wc for multi-byte character sets,
> and setbit_case_fold_c for single-byte character sets.
> (lex): Use setbit_case_fold_c for single-byte character sets.
> ---
>  src/dfa.c |  106 
> +++++++++++++++++++++++++++++++++----------------------------
>  1 files changed, 57 insertions(+), 49 deletions(-)
>
> diff --git a/src/dfa.c b/src/dfa.c
> index 83386aa..6602ae8 100644
> --- a/src/dfa.c
> +++ b/src/dfa.c
> @@ -536,55 +536,67 @@ dfasyntax (reg_syntax_t bits, int fold, unsigned char 
> eol)
>    eolbyte = eol;
>  }
>
> -/* Like setbit, but if case is folded, set both cases of a letter.
> -   For MB_CUR_MAX > 1, one or both of the two cases may not be set,
> -   so the resulting charset may only be used as an optimization.  */
> -static void
> -setbit_case_fold (
> +/* Set a bit in the charclass for the given wchar_t.  Do nothing if WC
> +   is represented by a multi-byte sequence.  Even for MB_CUR_MAX == 1,
> +   this may happen when folding case in weird Turkish locales where
> +   dotless i/dotted I are not included in the chosen character set.
> +   Return whether a bit was set in the charclass.  */
>  #if MBS_SUPPORT
> -                  wint_t b,
> +static bool
> +setbit_wc (wint_t wc, charclass c)
> +{
> +  int b = wctob (wc);
> +  if (b != EOF)
> +    {
> +      setbit (b, c);
> +      return true;
> +    }
> +  else
> +    return false;
> +}

ACK, with the reordering suggestion I posted separately.

> +/* Set a bit in the charclass for the given single byte character,
> +   if it is valid in the current character set.  */
> +static void
> +setbit_c (int b, charclass c)
> +{
> +  /* Do nothing if b is invalid in this character set.  */
> +  if (MB_CUR_MAX > 1 && btowc (b) == EOF)
> +    return;
> +  setbit (b, c);
> +}



reply via email to

[Prev in Thread] Current Thread [Next in Thread]