[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#16481: dfa.c and Rational Range Interpretation
From: |
Aharon Robbins |
Subject: |
bug#16481: dfa.c and Rational Range Interpretation |
Date: |
Sat, 25 Jan 2014 20:27:13 +0200 |
User-agent: |
Heirloom mailx 12.5 6/20/10 |
Hi Paul & Jim,
> > What happens if you compile them in and run the grep test suite?
>
> The test suite passes, but grep is bigger and (I presume) slower. The
> GREP-related changes are for performance, and shouldn't affect behavior.
>
> How about if we apply the attached patch to dfa.c, in both gawk and
> grep? I tried it just now, and gawk passed all its tests too. Or, if
> there's some reason this patch would introduce a bug into gawk, I'd like
> to fix the grep test cases to detect the bug.
The code in question occurs in two functions, parse_bracket_exp() and atom().
The first instance is in parse_bracket_exp(), building a range expression,
where we may have multibyte characters.
....
if (c1 == '-' && c2 != ']')
{
if (c2 == '\\' && (syntax_bits & RE_BACKSLASH_ESCAPE_IN_LISTS))
FETCH_WC (c2, wc2, _("unbalanced ["));
if (MB_CUR_MAX > 1)
{
/* When case folding map a range, say [m-z] (or even [M-z])
to the pair of ranges, [m-z] [M-Z]. */
REALLOC_IF_NECESSARY (work_mbc->range_sts,
range_sts_al, work_mbc->nranges + 1);
REALLOC_IF_NECESSARY (work_mbc->range_ends,
range_ends_al, work_mbc->nranges + 1);
work_mbc->range_sts[work_mbc->nranges] =
case_fold ? towlower (wc) : (wchar_t) wc;
work_mbc->range_ends[work_mbc->nranges++] =
case_fold ? towlower (wc2) : (wchar_t) wc2;
#ifndef GREP
if (case_fold && (iswalpha (wc) || iswalpha (wc2)))
{
REALLOC_IF_NECESSARY (work_mbc->range_sts,
range_sts_al, work_mbc->nranges + 1);
work_mbc->range_sts[work_mbc->nranges] = towupper (wc);
REALLOC_IF_NECESSARY (work_mbc->range_ends,
range_ends_al, work_mbc->nranges + 1);
work_mbc->range_ends[work_mbc->nranges++] = towupper (wc2);
}
#endif
}
To me this looks like when doing case folding (grep -i, IGNORECASE in gawk),
we turn the m.b. equivalent of [a-c] into [a-cA-C]. This would seem to be
necessary for correctness, and the question is why does grep not need it?
The next such bit is later on in the same function:
if (case_fold && iswalpha (wc))
{
wc = towlower (wc);
if (!setbit_wc (wc, ccl))
{
REALLOC_IF_NECESSARY (work_mbc->chars, chars_al,
work_mbc->nchars + 1);
work_mbc->chars[work_mbc->nchars++] = wc;
}
#ifdef GREP
continue;
#else
wc = towupper (wc);
#endif
}
if (!setbit_wc (wc, ccl))
{
REALLOC_IF_NECESSARY (work_mbc->chars, chars_al,
work_mbc->nchars + 1);
work_mbc->chars[work_mbc->nchars++] = wc;
}
}
while ((wc = wc1, (c = c1) != ']'));
This too looks related to case folding and ranges; if I read it
correctly, when case folding it added the lower case version and
now it has to add the uppercase version of the charcter.
Then, in atom(): (Why the bizarre leading `if (0)'?)
static void
atom (void)
{
if (0)
{
/* empty */
}
else if (MBS_SUPPORT && tok == WCHAR)
{
addtok_wc (case_fold ? towlower (wctok) : wctok);
#ifndef GREP
if (case_fold && iswalpha (wctok))
{
addtok_wc (towupper (wctok));
addtok (OR);
}
#endif
tok = lex ();
}
Here too, we're doing case folding, have added the lower case character
and need to add the upper case one.
I think to test out this code you'd need a character set where the lower
and upper case counterparts are multibyte characters and grep -i is
in effect. But I suspect that grep has so much other code to special
case grep -i that this code in dfa.c is never reached.
In short, I don't think it's right to remove this code, but I don't
know how to test it to prove that, either.
HTH,
Arnold
- bug#16481: dfa.c and Rational Range Interpretation, Aharon Robbins, 2014/01/17
- bug#16481: dfa.c and Rational Range Interpretation, Paul Eggert, 2014/01/17
- bug#16481: dfa.c and Rational Range Interpretation, Aharon Robbins, 2014/01/18
- bug#16481: dfa.c and Rational Range Interpretation, Paul Eggert, 2014/01/20
- bug#16481: dfa.c and Rational Range Interpretation, Aharon Robbins, 2014/01/20
- bug#16481: dfa.c and Rational Range Interpretation, Paul Eggert, 2014/01/21
- bug#16481: dfa.c and Rational Range Interpretation, Jim Meyering, 2014/01/21
- bug#16421: bug#16481: dfa.c and Rational Range Interpretation, Paul Eggert, 2014/01/21
- bug#16481: dfa.c and Rational Range Interpretation,
Aharon Robbins <=
- bug#16481: dfa.c and Rational Range Interpretation, Paul Eggert, 2014/01/25
- bug#16481: dfa.c and Rational Range Interpretation, Aharon Robbins, 2014/01/25