[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Bug#624387: [bug #33198] Incorrect bracket expression when parsing i
From: |
Jim Meyering |
Subject: |
Re: Bug#624387: [bug #33198] Incorrect bracket expression when parsing in ru_RU.KOI8-R (Russian locale) |
Date: |
Sun, 05 Jun 2011 19:15:55 +0200 |
Paolo Bonzini wrote:
> On Sat, Jun 4, 2011 at 09:48, Jim Meyering <address@hidden> wrote:
>>> The b2 == EOF part is required for the somewhat similar bug I fixed
>>> a month ago:
>>>
>>> fix a bug whereby echo c|grep '[c]' would fail for any c in 0x80..0xff
>>> 8da41c930e03a8635cbd8c89e3e591374c232c89
>>>
>>> The corresponding test demonstrates the need:
>>>
>>> tests: exercise bug with 0x80..0xff in [...]
>>> d98338ebf842ec9b69631837eee50ebdcd543505
>
> [\xff] is not well defined for a UTF-8 locale at all, actually.
> Perhaps FETCH_WC should return wc = EOF in this case (and c = 255),
> and it could be handled on a case-by-case basis elsewhere.
>
> But if wctob returns EOF, and b > UCHAR_MAX, you have introduced an
> out-of-bounds access in setbit.
Yes, I saw that.
That's why I added the guard I mentioned in previous mail.
I would like a test case that would segfault without the
(b < NOTCHAR) guard below. If someone can construct one,
I'll be more than happy to add it to the test suite.
Here's the patch I expect to push:
>From 168577596e38981d93ea57d56d325172cfed7dc7 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Thu, 2 Jun 2011 18:03:49 +0200
Subject: [PATCH 1/5] fix the [...] bug also for relatively unusual uni-byte
encodings
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
* src/dfa.c (setbit_case_fold): Also handle uni-byte locales
like the one mentioned in the original report: see 2011-05-07
commit d98338eb. Re-reported by Santiago Ruano Rincón.
Note that most uni-byte locales are not affected.
* NEWS (Bug fixes): Mention it.
---
NEWS | 4 ++++
src/dfa.c | 10 +++++++---
2 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/NEWS b/NEWS
index 312c803..67b3fad 100644
--- a/NEWS
+++ b/NEWS
@@ -4,6 +4,10 @@ GNU grep NEWS -*- outline
-*-
** Bug fixes
+ echo c|grep '[c]' would fail for any c in 0x80..0xff, with a uni-byte
+ encoding for which the byte-to-wide-char mapping is nontrivial. For
+ example, the ISO-88591 locales are not affected, but ru_RU.KOI8-R is.
+
grep -P no longer aborts when PCRE's backtracking limit is exceeded
Before, echo aaaaaaaaaaaaaab |grep -P '((a+)*)+$' would abort. Now,
it diagnoses the problem and exits with status 2.
diff --git a/src/dfa.c b/src/dfa.c
index b41cbb6..83386aa 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -573,10 +573,14 @@ setbit_case_fold (
else
{
#if MBS_SUPPORT
- int b2 = wctob ((unsigned char) b);
- if (b2 == EOF || b2 == b)
+ /* Below, note how when b2 != b and we have a uni-byte locale
+ (MB_CUR_MAX == 1), we set b = b2. I.e., in a uni-byte locale,
+ we can safely call setbit with a non-EOF value returned by wctob. */
+ int b2 = wctob (b);
+ if (b2 == EOF || b2 == b || (MB_CUR_MAX == 1 ? (b=b2), 1 : 0))
#endif
- setbit (b, c);
+ if (b < NOTCHAR)
+ setbit (b, c);
}
}
--
1.7.6.rc0.254.gf37de
[PATCH] dfa: fix case folding logic for character ranges, Paolo Bonzini, 2011/06/07
Re: [bug #33198] Incorrect bracket expression when parsing in ru_RU.KOI8-R (Russian locale), Jim Meyering, 2011/06/02
[bug #33198] Incorrect bracket expression when parsing in ru_RU.KOI8-R (Russian locale), Jim Meyering, 2011/06/19