bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Latest snapshot and glibc regex support


From: Jim Meyering
Subject: Re: Latest snapshot and glibc regex support
Date: Sat, 20 Mar 2010 11:07:30 +0100

Matthew Burgess wrote:
> On 19/03/2010 10:03, Jim Meyering wrote:
>> We expect a reversed range to make grep give a diagnostic and exit 2.
>> With glibc (--without-included-regex), you get no diagnostic and exit 1.
>
> OK, following the gnulib change at
> http://lists.gnu.org/archive/html/bug-gnulib/2010-02/msg00006.html and
> subsequent bug report at
> http://sourceware.org/bugzilla/show_bug.cgi?id=11244 I've come up with
> the attached patch (against the latest snapshot).  It results in grep
> using Glibc's regex implementation and all tests pass.
>
> However, I will admit (hopefully before the laughter starts!) that I'm
> not an experienced C coder, so the patch might just be a fluke!  I am
> certainly more than happy to receive constructive criticism on its
> contents or format though.

Thanks for contributing!
For future reference, please see grep's README-hacking
and http://git.sv.gnu.org/cgit/coreutils.git/plain/HACKING
(yes, I should add that in grep's repository, too)
You would notice that "configure" is not version-controlled,
since it's generated, and that the pieces that contribute to
its creation come almost entirely from gnulib.

> diff -Naur grep-2.5.4.183-9159.orig/configure grep-2.5.4.183-9159/configure
> --- grep-2.5.4.183-9159.orig/configure        2010-03-17 17:01:06.000000000 
> +0000
> +++ grep-2.5.4.183-9159/configure     2010-03-19 22:13:34.000000000 +0000
> @@ -14652,7 +14652,7 @@
>                return 1;
>
>              /* Ensure that [b-a] is diagnosed as invalid. */
> -            re_set_syntax (RE_SYNTAX_POSIX_EGREP);
> +            re_set_syntax (RE_SYNTAX_POSIX_EGREP | RE_NO_EMPTY_RANGES);

That's precisely the change I made.
However, the place to make it is in gnulib.
Here's the discussion, and a related change:

    http://article.gmane.org/gmane.comp.lib.gnulib.bugs/21015
    http://article.gmane.org/gmane.comp.lib.gnulib.bugs/21016

> diff -Naur grep-2.5.4.183-9159.orig/src/search.c 
> grep-2.5.4.183-9159/src/search.c
> --- grep-2.5.4.183-9159.orig/src/search.c     2010-03-17 10:23:12.000000000 
> +0000
> +++ grep-2.5.4.183-9159/src/search.c  2010-03-19 22:14:06.000000000 +0000
> @@ -368,7 +368,7 @@
>
>  COMPILE_FCT(Ecompile)
>  {
> -  return GEAcompile (pattern, size, RE_SYNTAX_POSIX_EGREP);
> +  return GEAcompile (pattern, size, RE_SYNTAX_POSIX_EGREP | 
> RE_NO_EMPTY_RANGES);

Thanks.
That's part of the patch, and it does cause "make check" to pass
even with the latest from gnulib.
Your change makes this command give a diagnostic and exit with status 2,
as it should:

  $ :|./grep -E '[b-a]'
  ./grep: Invalid range end

but not these two others (which would give no diagnostic and still
exit with status 1):

  :|./grep  '[b-a]'
  :|./egrep '[b-a]'

So I've done three things:
  - add a test to provide coverage of those two other cases
  - update to latest gnulib, avoiding a spurious test failure
  - add RE_NO_EMPTY_RANGES in the other two places

I'll push the following three patches shortly:

>From bf498a94165496a987f3ac800d6542028ea85a2c Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Sat, 20 Mar 2010 10:52:20 +0100
Subject: [PATCH 1/3] reject reversed-endpoint ranges, with all regex variants

* src/search.c: Add RE_NO_EMPTY_RANGES to the syntax bits
in three places, so that all of grep, egrep, and grep -E reject
a range with reversed endpoints like '[b-a]'.  This is required,
when using the latest version of gnulib's regex module, since it
now honors the RE_NO_EMPTY_RANGES flag, rather than acting as if
it were always set.
Based on a change by Matthew Burgess.
---
 src/search.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/search.c b/src/search.c
index c986d48..f2d9ed6 100644
--- a/src/search.c
+++ b/src/search.c
@@ -260,7 +260,7 @@ is_mb_middle(const char **good, const char *buf, const char 
*end)
 #ifdef EGREP_PROGRAM
 COMPILE_FCT(Ecompile)
 {
-  reg_syntax_t syntax_bits = RE_SYNTAX_POSIX_EGREP;
+  reg_syntax_t syntax_bits = RE_SYNTAX_POSIX_EGREP | RE_NO_EMPTY_RANGES;
 #else
 /* No __VA_ARGS__ in C89.  So we have to do it this way.  */
 static COMPILE_RET
@@ -358,7 +358,9 @@ GEAcompile (char const *pattern, size_t size, reg_syntax_t 
syntax_bits)
 COMPILE_FCT(Gcompile)
 {
   return GEAcompile (pattern, size,
-                    RE_SYNTAX_GREP | RE_HAT_LISTS_NOT_NEWLINE);
+                    (RE_SYNTAX_GREP
+                     | RE_HAT_LISTS_NOT_NEWLINE
+                     | RE_NO_EMPTY_RANGES));
 }

 COMPILE_FCT(Acompile)
@@ -368,7 +370,7 @@ COMPILE_FCT(Acompile)

 COMPILE_FCT(Ecompile)
 {
-  return GEAcompile (pattern, size, RE_SYNTAX_POSIX_EGREP);
+  return GEAcompile (pattern, size, RE_SYNTAX_POSIX_EGREP | 
RE_NO_EMPTY_RANGES);
 }
 #endif /* !EGREP_PROGRAM */

--
1.7.0.2.455.g91132


>From 812f1dbb1b54a63eff06aaea1d2038f31eb9ae0e Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Sat, 20 Mar 2010 09:40:30 +0100
Subject: [PATCH 2/3] build: update gnulib submodule to latest

This pulls in the latest regex module from gnulib, including a fix
to make it honor the RE_NO_EMPTY_RANGES syntax bit.

tests: temporarily disable irrelevant-to-grep failing C++ fcntl-h-tests
* bootstrap.conf (gnulib_tool_option_extras): Temporarily add
--avoid=fcntl-h-tests, until the C++ part of that test is fixed.
---
 bootstrap.conf       |    1 +
 build-aux/.gitignore |    1 +
 gnulib               |    2 +-
 3 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/bootstrap.conf b/bootstrap.conf
index 0f38fcb..2170e19 100644
--- a/bootstrap.conf
+++ b/bootstrap.conf
@@ -124,6 +124,7 @@ fi
 test -f ChangeLog || touch ChangeLog || exit 1

 gnulib_tool_option_extras="--tests-base=$bt/gnulib-tests --with-tests"
+gnulib_tool_option_extras="$gnulib_tool_option_extras --avoid=fcntl-h-tests"

 # Build prerequisites
 buildreq="\
diff --git a/build-aux/.gitignore b/build-aux/.gitignore
index ea3ae09..38b748c 100644
--- a/build-aux/.gitignore
+++ b/build-aux/.gitignore
@@ -1,5 +1,6 @@
 announce-gen
 arg-nonnull.h
+c++defs.h
 config.guess
 config.rpath
 config.sub
diff --git a/gnulib b/gnulib
index aadf332..602e3e6 160000
--- a/gnulib
+++ b/gnulib
@@ -1 +1 @@
-Subproject commit aadf332a5c5f209affd38c35b4e9faaa8a0adecb
+Subproject commit 602e3e6b709592f883ebb7bf58df1f955ea4b8f2
--
1.7.0.2.455.g91132


>From 159c02545be1bd4342d27c7ea5b9d06459d3c8aa Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Sat, 20 Mar 2010 10:21:32 +0100
Subject: [PATCH 3/3] tests: ensure that all programs handle [b-a] consistently

* tests/reversed-range-endpoints: New test.
* tests/Makefile.am (TESTS): Add it.
---
 tests/Makefile.am              |    1 +
 tests/reversed-range-endpoints |   20 ++++++++++++++++++++
 2 files changed, 21 insertions(+), 0 deletions(-)
 create mode 100644 tests/reversed-range-endpoints

diff --git a/tests/Makefile.am b/tests/Makefile.am
index d16cc21..67763b2 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -36,6 +36,7 @@ TESTS =                                               \
   options.sh                                   \
   pcre.sh                                      \
   pcre-z                                       \
+  reversed-range-endpoints                     \
   spencer1.sh                                  \
   spencer1-locale                              \
   status.sh                                    \
diff --git a/tests/reversed-range-endpoints b/tests/reversed-range-endpoints
new file mode 100644
index 0000000..e80c07a
--- /dev/null
+++ b/tests/reversed-range-endpoints
@@ -0,0 +1,20 @@
+#!/bin/sh
+# Ensure that an invalid range like [b-a] evokes exit status of 2.
+: ${srcdir=.}
+. "$srcdir/init.sh"; path_prepend_ ../src
+
+fail=0
+
+printf 'Invalid range end\n' > exp
+for prog in grep egrep 'grep -E'; do
+  $prog '[b-a]' < /dev/null > out 2>&1
+  # exit status must be 2, not 1
+  test $? = 2 || fail=1
+
+  # Remove "program_name: " prefix from actual output.
+  sed 's/^[a-z]*: //' out > k && mv k out
+
+  compare out exp || fail=1
+done
+
+Exit $fail
--
1.7.0.2.455.g91132




reply via email to

[Prev in Thread] Current Thread [Next in Thread]