bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Boyer Moore overflow patch


From: Julian Foad
Subject: Re: Boyer Moore overflow patch
Date: Wed, 15 Jun 2005 12:05:35 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b) Gecko/20050217

Please see the revised bug-fix patch, attached.


Charles Levert wrote:
Given that, it's pointless and even ambiguous
(if LC_ALL isn't already exported) to use

   LC_ALL=... function-call ...
   LC_ALL=... function-call ...

instead of

   LC_ALL=...; export LC_ALL
   function-call ...
   function-call ...

so I propose the following patch, to be applied
before the Boyer-Moore one.

The Boyer-Moore tests should then be inserted
before the UTF-8 tests which should always remain
the last ones in the file.

If the Boyer-Moore tests come before this part, then it doesn't matter in which order the two patches are applied.

That way, the whole file can still be called
manually with some specific LC_ALL value that
will be used for the tests in the first half of
the file, if desired.

--- tests/foad1.sh      2005-06-14 09:42:17 -0400
+++ tests/foad1.sh      2005-06-14 23:21:14 -0400
@@ -82,62 +82,61 @@
 grep_test "LIN7C 55327/" "" -wF -e 5327 -e 5532
-u=cs_CZ.UTF-8
+# The rest of this file is meant to be executed under this locale.
+LC_ALL=cs_CZ.UTF-8; export LC_ALL
[...]

Thanks.  That seems OK to me.  Please commit it if you are happy with it.

- Julian
Index: ChangeLog
===================================================================
RCS file: /cvsroot/grep/grep/ChangeLog,v
retrieving revision 1.247
diff -u -3 -p -d -r1.247 ChangeLog
--- ChangeLog   14 Jun 2005 20:56:41 -0000      1.247
+++ ChangeLog   15 Jun 2005 10:47:37 -0000
@@ -1,3 +1,15 @@
+2005-06-15  Julian Foad  <address@hidden>
+
+       Fix a bug in the Boyer-Moore code in handling certain potential
+       matches over 255 bytes long, resulting in spurious or missed matches.
+       Thanks to Steve Summit for the bug report and reproduction recipe,
+       and to Stepan Kasal for the fix and test cases.
+
+       * src/kwset.c
+         (MIN): New macro.
+         (kwsprep): Ensure that "delta" values don't overflow.
+       * tests/foad1.sh: Add regression tests for this.
+
 2005-06-14  Charles Levert  <address@hidden>
 
        Fix bug #11022 (Line wrapping causes GREP_COLOR background
Index: src/kwset.c
===================================================================
RCS file: /cvsroot/grep/grep/src/kwset.c,v
retrieving revision 1.6
diff -u -3 -p -d -r1.6 kwset.c
--- src/kwset.c 2 May 2005 09:47:48 -0000       1.6
+++ src/kwset.c 15 Jun 2005 10:47:37 -0000
@@ -46,6 +46,9 @@ extern char *xmalloc();
 #define obstack_chunk_alloc malloc
 #define obstack_chunk_free free
 
+#undef MIN
+#define MIN(A,B) ((A) < (B) ? (A) : (B))
+
 /* Balanced tree of edges and labels leaving a given trie node. */
 struct tree
 {
@@ -404,9 +407,10 @@ kwsprep (kwset_t kws)
          kwset->target[i] = curr->links->label;
          curr = curr->links->trie;
        }
-      /* Build the Boyer Moore delta.  Boy that's easy compared to CW. */
+      /* Build the Boyer-Moore delta.  Boy that's easy compared to CW. */
       for (i = 0; i < kwset->mind; ++i)
-       delta[(unsigned char) kwset->target[i]] = kwset->mind - (i + 1);
+       delta[(unsigned char) kwset->target[i]] =
+         MIN(kwset->mind - (i + 1), UCHAR_MAX);
       kwset->mind2 = kwset->mind;
       /* Find the minimal delta2 shift that we might make after
         a backwards match has failed. */
Index: tests/foad1.sh
===================================================================
RCS file: /cvsroot/grep/grep/tests/foad1.sh,v
retrieving revision 1.8
diff -u -3 -p -d -r1.8 foad1.sh
--- tests/foad1.sh      14 Jun 2005 20:56:42 -0000      1.8
+++ tests/foad1.sh      15 Jun 2005 10:47:37 -0000
@@ -82,6 +82,19 @@ grep_test "A/CX/B/C/" "A/B/C/" -wF -e A 
 grep_test "LIN7C 55327/" "" -wF -e 5327 -e 5532
 
 
+# Test for Boyer-Moore bugs.  Thanks to Steve Summit for reporting the bug
+# and providing a test case which inspired these.
+b17='bbbbbbbbbbbbbbbbb'
+b85="$b17$b17$b17$b17$b17"
+b255="$b85$b85$b85"
+x16='xxxxxxxxxxxxxxxx'
+x64="$x16$x16$x16$x16"
+x256="$x64$x64$x64$x64"
+bm="cbbba${b255}c"
+grep_test "a${b255}a" "" "a${b255}b"
+grep_test "${x256}${bm}${x16}xx" "${x256}${bm}${x16}xx/" "$bm"
+
+
 u=cs_CZ.UTF-8
 # If the UTF-8 locale doesn't work, skip these tests silently.
 if LC_ALL="$u" locale -k LC_CTYPE 2>/dev/null |

reply via email to

[Prev in Thread] Current Thread [Next in Thread]