bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug#624387: [bug #33198] Incorrect bracket expression when parsing i


From: Jim Meyering
Subject: Re: Bug#624387: [bug #33198] Incorrect bracket expression when parsing in ru_RU.KOI8-R (Russian locale)
Date: Sun, 05 Jun 2011 19:15:55 +0200

Paolo Bonzini wrote:
> On Sat, Jun 4, 2011 at 09:48, Jim Meyering <address@hidden> wrote:
>>> The b2 == EOF part is required for the somewhat similar bug I fixed
>>> a month ago:
>>>
>>>     fix a bug whereby echo c|grep '[c]' would fail for any c in 0x80..0xff
>>>     8da41c930e03a8635cbd8c89e3e591374c232c89
>>>
>>> The corresponding test demonstrates the need:
>>>
>>>     tests: exercise bug with 0x80..0xff in [...]
>>>     d98338ebf842ec9b69631837eee50ebdcd543505
>
> [\xff] is not well defined for a UTF-8 locale at all, actually.
> Perhaps FETCH_WC should return wc = EOF in this case (and c = 255),
> and it could be handled on a case-by-case basis elsewhere.
>
> But if wctob returns EOF, and b > UCHAR_MAX, you have introduced an
> out-of-bounds access in setbit.

Yes, I saw that.
That's why I added the guard I mentioned in previous mail.

I would like a test case that would segfault without the
(b < NOTCHAR) guard below.  If someone can construct one,
I'll be more than happy to add it to the test suite.

Here's the patch I expect to push:

>From 168577596e38981d93ea57d56d325172cfed7dc7 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Thu, 2 Jun 2011 18:03:49 +0200
Subject: [PATCH 1/5] fix the [...] bug also for relatively unusual uni-byte
 encodings
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* src/dfa.c (setbit_case_fold): Also handle uni-byte locales
like the one mentioned in the original report: see 2011-05-07
commit d98338eb.  Re-reported by Santiago Ruano Rincón.
Note that most uni-byte locales are not affected.
* NEWS (Bug fixes): Mention it.
---
 NEWS      |    4 ++++
 src/dfa.c |   10 +++++++---
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/NEWS b/NEWS
index 312c803..67b3fad 100644
--- a/NEWS
+++ b/NEWS
@@ -4,6 +4,10 @@ GNU grep NEWS                                    -*- outline 
-*-

 ** Bug fixes

+  echo c|grep '[c]' would fail for any c in 0x80..0xff, with a uni-byte
+  encoding for which the byte-to-wide-char mapping is nontrivial.  For
+  example, the ISO-88591 locales are not affected, but ru_RU.KOI8-R is.
+
   grep -P no longer aborts when PCRE's backtracking limit is exceeded
   Before, echo aaaaaaaaaaaaaab |grep -P '((a+)*)+$' would abort.  Now,
   it diagnoses the problem and exits with status 2.
diff --git a/src/dfa.c b/src/dfa.c
index b41cbb6..83386aa 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -573,10 +573,14 @@ setbit_case_fold (
   else
     {
 #if MBS_SUPPORT
-      int b2 = wctob ((unsigned char) b);
-      if (b2 == EOF || b2 == b)
+      /* Below, note how when b2 != b and we have a uni-byte locale
+         (MB_CUR_MAX == 1), we set b = b2.  I.e., in a uni-byte locale,
+         we can safely call setbit with a non-EOF value returned by wctob.  */
+      int b2 = wctob (b);
+      if (b2 == EOF || b2 == b || (MB_CUR_MAX == 1 ? (b=b2), 1 : 0))
 #endif
-        setbit (b, c);
+        if (b < NOTCHAR)
+          setbit (b, c);
     }
 }

--
1.7.6.rc0.254.gf37de



reply via email to

[Prev in Thread] Current Thread [Next in Thread]