[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 2/4] dfa: correct handling of single-byte character ranges
From: |
Jim Meyering |
Subject: |
Re: [PATCH 2/4] dfa: correct handling of single-byte character ranges |
Date: |
Tue, 07 Jun 2011 13:24:05 +0200 |
Paolo Bonzini wrote:
> This provides a better fix for the unibyte-bracket-expr and high-bit-range
> testcases, and fixes the latent bug tested by bogus-wctob.
>
> * src/dfa.c (setbit_case_fold): Remove, replace with...
> (setbit_wc, setbit_c, setbit_case_fold_c): ... these.
> (parse_bracket_exp): Use setbit_case_fold_c when iterating over
> single-byte sequences. Use setbit_wc for multi-byte character sets,
> and setbit_case_fold_c for single-byte character sets.
> (lex): Use setbit_case_fold_c for single-byte character sets.
> ---
> src/dfa.c | 106
> +++++++++++++++++++++++++++++++++----------------------------
> 1 files changed, 57 insertions(+), 49 deletions(-)
>
> diff --git a/src/dfa.c b/src/dfa.c
> index 83386aa..6602ae8 100644
> --- a/src/dfa.c
> +++ b/src/dfa.c
> @@ -536,55 +536,67 @@ dfasyntax (reg_syntax_t bits, int fold, unsigned char
> eol)
> eolbyte = eol;
> }
>
> -/* Like setbit, but if case is folded, set both cases of a letter.
> - For MB_CUR_MAX > 1, one or both of the two cases may not be set,
> - so the resulting charset may only be used as an optimization. */
> -static void
> -setbit_case_fold (
> +/* Set a bit in the charclass for the given wchar_t. Do nothing if WC
> + is represented by a multi-byte sequence. Even for MB_CUR_MAX == 1,
> + this may happen when folding case in weird Turkish locales where
> + dotless i/dotted I are not included in the chosen character set.
> + Return whether a bit was set in the charclass. */
> #if MBS_SUPPORT
> - wint_t b,
> +static bool
> +setbit_wc (wint_t wc, charclass c)
> +{
> + int b = wctob (wc);
> + if (b != EOF)
> + {
> + setbit (b, c);
> + return true;
> + }
> + else
> + return false;
> +}
ACK, with the reordering suggestion I posted separately.
> +/* Set a bit in the charclass for the given single byte character,
> + if it is valid in the current character set. */
> +static void
> +setbit_c (int b, charclass c)
> +{
> + /* Do nothing if b is invalid in this character set. */
> + if (MB_CUR_MAX > 1 && btowc (b) == EOF)
> + return;
> + setbit (b, c);
> +}