[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: grep -F causes infinite loop
From: |
Jim Meyering |
Subject: |
Re: grep -F causes infinite loop |
Date: |
Tue, 28 May 2013 05:11:35 +0200 |
Jim Meyering wrote:
> GOTO, Daisuke wrote:
>> Hello, there,
>>
>> (Since I was mistaken in the e-mail place, I re-mail.)
>>
>> grep -F causes infinite loop in a text which LOCALE differ.
>> (LOCALE is a ja_JP.UTF-8, and text is a SJIS)
>>
>> It did not occur with an old version(GNU grep 2.6.1 or before).
>> Moreover, also when there is no LOCALE, it does not occur.
> ...
>> # printf '\202\240\202\240' | grep -F $'\202\240'
>
> Thank you very much for that bug report.
> This infloops for me on F18, and probably in any UTF-8 locale:
>
> $ printf '\202\240\202\240' | LC_ALL=en_US.UTF-8 grep $'\202\240'
>
> Here's one way to fix it, making it so grep reports no match.
> While it's nearly the smallest change to avoid the infloop,
> I'm debating whether we need something else.
I've included a complete patch below.
Here's an even smaller example (the -F is required):
printf '\202x\202' | LC_ALL=en_US.UTF-8 src/grep -F $'\202'
For now, I prefer the minimal infloop-avoiding fix.
A future change may make the additional change I suggested:
rejecting an invalid multibyte search string with a diagnostic
and exit status 2.
[I note that indentation in Makefile.am was inconsistent,
and am fixing that in a separate patch. ]
>From 1de170009dac88a2e2300e8d3e8f4aa9b64e9343 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Mon, 27 May 2013 19:54:55 -0700
Subject: [PATCH] grep -F: avoid an infinite loop with invalid multi-byte
search string
* src/kwsearch.c (Fexecute): Avoid an infinite loop when processing
a fixed (-F) multibyte search string that is an invalid byte sequence
in the current locale and that matches the bytes of the input twice
on a line. Reported by Daisuke GOTO in
http://thread.gmane.org/gmane.comp.gnu.grep.bugs/4773
* tests/invalid-multibyte-infloop: New test.
* tests/Makefile.am (TESTS): Add it.
* NEWS (Bug fixes): Mention it.
---
NEWS | 4 ++++
THANKS | 1 +
src/kwsearch.c | 6 ++----
tests/Makefile.am | 1 +
tests/invalid-multibyte-infloop | 20 ++++++++++++++++++++
5 files changed, 28 insertions(+), 4 deletions(-)
create mode 100755 tests/invalid-multibyte-infloop
diff --git a/NEWS b/NEWS
index 5dc0a8c..407e0b0 100644
--- a/NEWS
+++ b/NEWS
@@ -4,6 +4,10 @@ GNU grep NEWS -*- outline
-*-
** Bug fixes
+ grep -F would get stuck in an infinite loop when given a search string
+ that is an invalid byte sequence in the current locale and that matches
+ the bytes of the input twice on a line. Now grep fails with exit status 1.
+
grep -P could misbehave. While multi-byte mode is only supported by PCRE
with UTF-8 locales, grep did not activate it. This would cause failures
to match multibyte characters against some regular expressions, especially
diff --git a/THANKS b/THANKS
index edbfbfc..1a1901c 100644
--- a/THANKS
+++ b/THANKS
@@ -19,6 +19,7 @@ Bruno Haible <address@hidden>
Christian Groessler <address@hidden>
Corinna Vinschen <address@hidden>
Dagobert Michelsen <address@hidden>
+Daisuke GOTO <address@hidden>
David Clissold <address@hidden>
David J MacKenzie <address@hidden>
David O'Brien <address@hidden>
diff --git a/src/kwsearch.c b/src/kwsearch.c
index 8551025..51d1ad3 100644
--- a/src/kwsearch.c
+++ b/src/kwsearch.c
@@ -111,11 +111,9 @@ Fexecute (char const *buf, size_t size, size_t *match_size,
mbstate_t s;
memset (&s, 0, sizeof s);
size_t mb_len = mbrlen (mb_start, (buf + size) - (beg + offset), &s);
- if (mb_len == (size_t) -2)
+ if (mb_len == (size_t) -2 || mb_len == (size_t) -1)
goto failure;
- beg = mb_start;
- if (mb_len != (size_t) -1)
- beg += mb_len - 1;
+ beg = mb_start + mb_len - 1;
continue;
}
beg += offset;
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 9eb31aa..3f585fa 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -66,6 +66,7 @@ TESTS = \
in-eq-out-infloop \
include-exclude \
inconsistent-range \
+ invalid-multibyte-infloop \
khadafy \
max-count-vs-context \
empty-line-mb \
diff --git a/tests/invalid-multibyte-infloop b/tests/invalid-multibyte-infloop
new file mode 100755
index 0000000..ad20bb3
--- /dev/null
+++ b/tests/invalid-multibyte-infloop
@@ -0,0 +1,20 @@
+#!/bin/sh
+# Test that equivalence classes work.
+
+. "${srcdir=.}/init.sh"; path_prepend_ ../src
+
+require_en_utf8_locale_
+require_compiled_in_MB_support
+require_timeout_
+
+printf '\202' > search-str || framework_failure_
+cat search-str search-str > input || framework_failure_
+
+fail=0
+
+# Before 2.15, this would infloop.
+LC_ALL=en_US.UTF-8 timeout 3 grep -F -f search-str input > out
+test $? = 1 || fail=1
+test -s out && fail=1
+
+Exit $fail
--
1.8.3
- Re: grep -F causes infinite loop,
Jim Meyering <=