bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: grep -F causes infinite loop


From: Jim Meyering
Subject: Re: grep -F causes infinite loop
Date: Tue, 28 May 2013 05:11:35 +0200

Jim Meyering wrote:
> GOTO, Daisuke wrote:
>> Hello, there,
>>
>> (Since I was mistaken in the e-mail place, I re-mail.)
>>
>> grep -F causes infinite loop in a text which LOCALE differ.
>> (LOCALE is a ja_JP.UTF-8, and text is a SJIS)
>>
>> It did not occur with an old version(GNU grep 2.6.1 or before).
>> Moreover, also when there is no LOCALE, it does not occur.
> ...
>> # printf '\202\240\202\240' | grep -F $'\202\240'
>
> Thank you very much for that bug report.
> This infloops for me on F18, and probably in any UTF-8 locale:
>
>     $ printf '\202\240\202\240' | LC_ALL=en_US.UTF-8 grep $'\202\240'
>
> Here's one way to fix it, making it so grep reports no match.
> While it's nearly the smallest change to avoid the infloop,
> I'm debating whether we need something else.

I've included a complete patch below.
Here's an even smaller example (the -F is required):

    printf '\202x\202' | LC_ALL=en_US.UTF-8 src/grep -F $'\202'

For now, I prefer the minimal infloop-avoiding fix.
A future change may make the additional change I suggested:
rejecting an invalid multibyte search string with a diagnostic
and exit status 2.

[I note that indentation in Makefile.am was inconsistent,
and am fixing that in a separate patch. ]

>From 1de170009dac88a2e2300e8d3e8f4aa9b64e9343 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Mon, 27 May 2013 19:54:55 -0700
Subject: [PATCH] grep -F: avoid an infinite loop with invalid multi-byte
 search string

* src/kwsearch.c (Fexecute): Avoid an infinite loop when processing
a fixed (-F) multibyte search string that is an invalid byte sequence
in the current locale and that matches the bytes of the input twice
on a line.  Reported by Daisuke GOTO in
http://thread.gmane.org/gmane.comp.gnu.grep.bugs/4773
* tests/invalid-multibyte-infloop: New test.
* tests/Makefile.am (TESTS): Add it.
* NEWS (Bug fixes): Mention it.
---
 NEWS                            |  4 ++++
 THANKS                          |  1 +
 src/kwsearch.c                  |  6 ++----
 tests/Makefile.am               |  1 +
 tests/invalid-multibyte-infloop | 20 ++++++++++++++++++++
 5 files changed, 28 insertions(+), 4 deletions(-)
 create mode 100755 tests/invalid-multibyte-infloop

diff --git a/NEWS b/NEWS
index 5dc0a8c..407e0b0 100644
--- a/NEWS
+++ b/NEWS
@@ -4,6 +4,10 @@ GNU grep NEWS                                    -*- outline 
-*-

 ** Bug fixes

+  grep -F would get stuck in an infinite loop when given a search string
+  that is an invalid byte sequence in the current locale and that matches
+  the bytes of the input twice on a line.  Now grep fails with exit status 1.
+
   grep -P could misbehave.  While multi-byte mode is only supported by PCRE
   with UTF-8 locales, grep did not activate it.  This would cause failures
   to match multibyte characters against some regular expressions, especially
diff --git a/THANKS b/THANKS
index edbfbfc..1a1901c 100644
--- a/THANKS
+++ b/THANKS
@@ -19,6 +19,7 @@ Bruno Haible               <address@hidden>
 Christian Groessler        <address@hidden>
 Corinna Vinschen           <address@hidden>
 Dagobert Michelsen         <address@hidden>
+Daisuke GOTO               <address@hidden>
 David Clissold             <address@hidden>
 David J MacKenzie          <address@hidden>
 David O'Brien              <address@hidden>
diff --git a/src/kwsearch.c b/src/kwsearch.c
index 8551025..51d1ad3 100644
--- a/src/kwsearch.c
+++ b/src/kwsearch.c
@@ -111,11 +111,9 @@ Fexecute (char const *buf, size_t size, size_t *match_size,
           mbstate_t s;
           memset (&s, 0, sizeof s);
           size_t mb_len = mbrlen (mb_start, (buf + size) - (beg + offset), &s);
-          if (mb_len == (size_t) -2)
+          if (mb_len == (size_t) -2 || mb_len == (size_t) -1)
             goto failure;
-          beg = mb_start;
-          if (mb_len != (size_t) -1)
-            beg += mb_len - 1;
+          beg = mb_start + mb_len - 1;
           continue;
         }
       beg += offset;
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 9eb31aa..3f585fa 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -66,6 +66,7 @@ TESTS =                                               \
   in-eq-out-infloop                             \
   include-exclude                              \
   inconsistent-range                            \
+  invalid-multibyte-infloop                     \
   khadafy                                      \
   max-count-vs-context                         \
   empty-line-mb                                        \
diff --git a/tests/invalid-multibyte-infloop b/tests/invalid-multibyte-infloop
new file mode 100755
index 0000000..ad20bb3
--- /dev/null
+++ b/tests/invalid-multibyte-infloop
@@ -0,0 +1,20 @@
+#!/bin/sh
+# Test that equivalence classes work.
+
+. "${srcdir=.}/init.sh"; path_prepend_ ../src
+
+require_en_utf8_locale_
+require_compiled_in_MB_support
+require_timeout_
+
+printf '\202' > search-str || framework_failure_
+cat search-str search-str > input || framework_failure_
+
+fail=0
+
+# Before 2.15, this would infloop.
+LC_ALL=en_US.UTF-8 timeout 3 grep -F -f search-str input > out
+test $? = 1 || fail=1
+test -s out && fail=1
+
+Exit $fail
--
1.8.3



reply via email to

[Prev in Thread] Current Thread [Next in Thread]