bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: grep-2.6 is imminent: pending patches, bug reports?


From: Jim Meyering
Subject: Re: grep-2.6 is imminent: pending patches, bug reports?
Date: Thu, 04 Mar 2010 23:06:14 +0100

address@hidden wrote:
>>  echo Y | LC_ALL=en_US.UTF-8 ./grep -i '[y]'
>
> I think gawk dfa fixes this. It rings a vague bell....

That one at least is fixed by syncing from gawk's dfa.c.
Here's the patch I've just written.

Debian's 61-dfa.c-case_fold-charclass.patch had many superfluous casts,
but appeared to be semantically equivalent to the dfa.c change below.


>From 4a0f966463ed44e90958aa75f048dace7edd3649 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Thu, 4 Mar 2010 22:23:06 +0100
Subject: [PATCH] fix a bug in handling of -i and character classes

* dfa.c (parse_bracket_exp_mb): Sync from gawk's dfa.c.
* tests/case-fold-char-class: New file.  Test for the bug.
* tests/Makefile.am (TESTS): Add it.
(TESTS_ENVIRONMENT): Propagate LOCALE_FR and LOCALE_FR_UTF8
definitions into tests.
* NEWS (Bug fixes): Mention it.
---
 NEWS                       |    3 ++
 src/dfa.c                  |    7 +++++
 tests/Makefile.am          |   57 +++++++++++++++++++++++--------------------
 tests/case-fold-char-class |   14 ++++++++++
 4 files changed, 54 insertions(+), 27 deletions(-)
 create mode 100644 tests/case-fold-char-class

diff --git a/NEWS b/NEWS
index 70881c7..6685967 100644
--- a/NEWS
+++ b/NEWS
@@ -4,6 +4,9 @@ GNU grep NEWS                                    -*- outline -*-

 ** Bug fixes

+  grep -i with a character class would malfunction in multi-byte locales.
+  For example, echo Y | LC_ALL=en_US.UTF-8 grep -i '[y]' would print nothing.
+
   grep would mistakenly exit with status 1 upon error, rather than 2,
   as it is documented to do.

diff --git a/src/dfa.c b/src/dfa.c
index 60ec372..09c0c96 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -654,6 +654,13 @@ parse_bracket_exp_mb (void)
          REALLOC_IF_NECESSARY(work_mbc->chars, wchar_t, chars_al,
                               work_mbc->nchars + 1);
          work_mbc->chars[work_mbc->nchars++] = (wchar_t)wc;
+         if (case_fold && (iswlower(wc) || iswupper(wc)))
+           {
+             REALLOC_IF_NECESSARY(work_mbc->chars, wchar_t, chars_al,
+                                  work_mbc->nchars + 1);
+             work_mbc->chars[work_mbc->nchars++] =
+               (wchar_t) (iswlower(wc) ? towupper(wc) : towlower(wc));
+           }
        }
     }
   while ((wc = wc1) != L']');
diff --git a/tests/Makefile.am b/tests/Makefile.am
index cee1fa4..276209d 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -14,35 +14,36 @@
 # You should have received a copy of the GNU General Public License
 # along with this program.  If not, see <http://www.gnu.org/licenses/>.

-TESTS =                \
-  backref.sh   \
-  bre.sh       \
-  empty.sh     \
-  ere.sh       \
-  file.sh      \
-  fmbtest.sh   \
-  foad1.sh     \
-  help-version \
-  khadafy.sh   \
-  max-count-vs-context \
-  options.sh   \
-  pcre.sh      \
-  spencer1.sh  \
-  status.sh    \
-  warning.sh   \
-  word-multi-file \
+TESTS =                                                \
+  backref.sh                                   \
+  bre.sh                                       \
+  case-fold-char-class                         \
+  empty.sh                                     \
+  ere.sh                                       \
+  file.sh                                      \
+  fmbtest.sh                                   \
+  foad1.sh                                     \
+  help-version                                 \
+  khadafy.sh                                   \
+  max-count-vs-context                         \
+  options.sh                                   \
+  pcre.sh                                      \
+  spencer1.sh                                  \
+  status.sh                                    \
+  warning.sh                                   \
+  word-multi-file                              \
   yesno.sh

-EXTRA_DIST =   \
-  $(TESTS)     \
-  bre.awk      \
-  bre.tests    \
-  ere.awk      \
-  ere.tests    \
-  init.sh      \
-  khadafy.lines        \
-  khadafy.regexp \
-  spencer1.awk \
+EXTRA_DIST =                                   \
+  $(TESTS)                                     \
+  bre.awk                                      \
+  bre.tests                                    \
+  ere.awk                                      \
+  ere.tests                                    \
+  init.sh                                      \
+  khadafy.lines                                        \
+  khadafy.regexp                               \
+  spencer1.awk                                 \
   spencer1.tests

 CLEANFILES = \
@@ -69,6 +70,8 @@ TESTS_ENVIRONMENT =                           \
     fi;                                                \
   };                                           \
   export                                       \
+  LOCALE_FR='$(LOCALE_FR)'                      \
+  LOCALE_FR_UTF8='$(LOCALE_FR_UTF8)'            \
   AWK=$(AWK)                                   \
   GREP=$(top_builddir)/src/grep                        \
   GREP_OPTIONS=''                              \
diff --git a/tests/case-fold-char-class b/tests/case-fold-char-class
new file mode 100644
index 0000000..c36b314
--- /dev/null
+++ b/tests/case-fold-char-class
@@ -0,0 +1,14 @@
+#!/bin/sh
+# This would fail for grep-2.5.3
+: ${srcdir=.}
+. "$srcdir/init.sh"; path_prepend_ ../src
+
+printf 'Y\n'      > exp || framework_failure
+fail=0
+
+for LOC in en_US.UTF-8 zh_CN $LOCALE_FR_UTF8; do
+  printf 'X\nY\nZ\n' | LC_ALL=$LOC grep -i '[y]' > out || fail=1
+  compare out exp || fail=1
+done
+
+Exit $fail
--
1.7.0.1.300.gd855a




reply via email to

[Prev in Thread] Current Thread [Next in Thread]