bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH 1/2] grep: diagnose and exit-2 for bogus REs like [:space:], [:di


From: Jim Meyering
Subject: [PATCH 1/2] grep: diagnose and exit-2 for bogus REs like [:space:], [:digit:], etc.
Date: Wed, 01 Sep 2010 19:09:30 +0200

I've just tweaked the documentation and pushed this:

>From adf0bf94a10d40f39d72663878a245163232aae6 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Tue, 31 Aug 2010 21:14:41 +0200
Subject: [PATCH 1/2] grep: diagnose and exit-2 for bogus REs like [:space:], 
[:digit:], etc.

When I make a mistake like this:
  grep '[:lower:]' ...
be it in a script or on the command line, I want to know about
it as soon as possible.  I don't want grep to print a mere warning
that it is interpreting this suspicious and almost guaranteed-wrong
regular expression as a set of just 6 bytes.  And I certainly don't
want grep to silently do the wrong thing, even if that would be
officially standards-conforming.  It's obvious that I intended
[[:lower:]], and I want my error to be diagnosed in a way that is
most likely to get my attention.  Thus, with this change, grep now
prints a diagnostic and exits with status 2 the moment it
encounters an offending [:char_class:] construct.

This changes the way grep works by default, rather than
putting this new behavior on an option.  A new option
would seldom be used in scripts (not portable), and would
probably be used only rarely by those who need it the most.
This new functionality provides a valuable safety measure
and incurs truly negligible risk.

For strict POSIX compliance, set POSIXLY_CORRECT in
your environment.  That disables this new feature.

Revert the changes from commit 2cd3bcea, "grep: add
--warnings={always,never,auto}.", and then do the following:

* src/dfasearch.c (dfawarn): Call getenv("POSIXLY_CORRECT") here;
Remove "warning: " from the diagnostic, now that it's more than
a warning, and exit with status 2.
* NEWS (New features): Describe the new semantics.
* tests/warn-char-classes: Adjust one test to accommodate this change.
* doc/grep.texi (Character Classes and Bracket Expressions): Document.
(Environment Variables): Cross-reference it.
Remove reference to obsolete getopt illegal vs. invalid difference.
Thanks to Paul Eggert for suggestions and an initial prod.
---
 NEWS                    |   13 +++++----
 doc/grep.texi           |   15 ++++++----
 src/dfasearch.c         |   20 +++++++++-----
 src/grep.h              |    2 -
 src/main.c              |   67 +++++++----------------------------------------
 tests/warn-char-classes |    6 +++-
 6 files changed, 43 insertions(+), 80 deletions(-)

diff --git a/NEWS b/NEWS
index 553ec52..18289ba 100644
--- a/NEWS
+++ b/NEWS
@@ -29,12 +29,13 @@ GNU grep NEWS                                    -*- 
outline -*-

 ** New features

-  grep now will warn for very common regular expression mistakes,
-  such as using [:space:] instead of [[:space:]].  Warnings are
-  disabled by POSIXLY_CORRECT.  They are also disabled when stdout
-  is not a TTY, thus minimizing the chance of extraneous output
-  in a script.  However, the rules for enabling/disabling warnings
-  are experimental and subject to change in future releases.
+  grep now diagnoses (and fails with exit status 2) commonly mistyped
+  regular expression like [:space:], [:digit:], etc.  Before, those were
+  silently interpreted as [ac:eps] and [dgit:] respectively.  Virtually
+  all who make that class of mistake should have used [[:space:]] or
+  [[:digit:]].  This new behavior is disabled when the POSIXLY_CORRECT
+  environment variable is set.
+

 * Noteworthy changes in release 2.6.3 (2010-04-02) [stable]

diff --git a/doc/grep.texi b/doc/grep.texi
index 890ba1b..da103c4 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -935,12 +935,8 @@ follow file names must be treated as file names;
 by default,
 such options are permuted to the front of the operand list
 and are treated as options.
-Also,
address@hidden requires that unrecognized options be diagnosed as ``illegal'',
-but since they are not really against the law the default
-is to diagnose them as ``invalid''.
address@hidden also disables @address@hidden,
-described below.
+Also, @code{POSIXLY_CORRECT} disables special handling of an
+invalid bracket expression.  @xref{invalid-bracket-expr}.

 @item address@hidden
 @vindex address@hidden @r{environment variable}
@@ -1269,6 +1265,13 @@ encoding, whereas the former is independent of locale 
and character set.
 part of the symbolic names, and must be included in addition to
 the brackets delimiting the bracket expression.)

address@hidden
+If you mistakenly omit the outer brackets, and search for say, 
@samp{[:upper:]},
+GNU @command{grep} prints a diagnostic and exits with status 2, on
+the assumption that you did not intend to search for the nominally
+equivalent regular expression: @samp{[:epru]}.
+Set the @code{POSIXLY_CORRECT} environment variable to disable this feature.
+
 Most meta-characters lose their special meaning inside bracket expressions.

 @table @samp
diff --git a/src/dfasearch.c b/src/dfasearch.c
index 29a1f7e..44eb08a 100644
--- a/src/dfasearch.c
+++ b/src/dfasearch.c
@@ -46,13 +46,6 @@ static struct patterns *patterns;
 static size_t pcount;

 void
-dfawarn (char const *mesg)
-{
-  if (warnings)
-    error (0, 0, _("warning: %s"), mesg);
-}
-
-void
 dfaerror (char const *mesg)
 {
   error (EXIT_TROUBLE, 0, "%s", mesg);
@@ -62,6 +55,19 @@ dfaerror (char const *mesg)
   abort ();
 }

+/* For now, the sole dfawarn-eliciting condition (use of a regexp
+   like '[:lower:]') is unequivocally an error, so treat it as such,
+   when possible.  */
+void
+dfawarn (char const *mesg)
+{
+  static enum { NONE = 0, POSIX, GNU } mode;
+  if (mode == NONE)
+    mode = (getenv ("POSIXLY_CORRECT") ? POSIX : GNU);
+  if (mode == GNU)
+    dfaerror (mesg);
+}
+
 /* Number of compiled fixed strings known to exactly match the regexp.
    If kwsexec returns < kwset_exact_matches, then we don't need to
    call the regexp matcher at all. */
diff --git a/src/grep.h b/src/grep.h
index 64c1865..67ea793 100644
--- a/src/grep.h
+++ b/src/grep.h
@@ -43,5 +43,3 @@ extern int match_icase;               /* -i */
 extern int match_words;                /* -w */
 extern int match_lines;                /* -x */
 extern unsigned char eolbyte;  /* -z */
-
-extern int warnings;
diff --git a/src/main.c b/src/main.c
index 6f5c4ae..0a91b7f 100644
--- a/src/main.c
+++ b/src/main.c
@@ -286,8 +286,7 @@ enum
   LABEL_OPTION,
   EXCLUDE_DIRECTORY_OPTION,
   GROUP_SEPARATOR_OPTION,
-  MMAP_OPTION,
-  WARN_OPTION,
+  MMAP_OPTION
 };

 /* Long options equivalences. */
@@ -343,7 +342,6 @@ static struct option const long_options[] =
   {"binary", no_argument, NULL, 'U'},
   {"unix-byte-offsets", no_argument, NULL, 'u'},
   {"version", no_argument, NULL, 'V'},
-  {"warn", optional_argument, NULL, WARN_OPTION},
   {"with-filename", no_argument, NULL, 'H'},
   {"word-regexp", no_argument, NULL, 'w'},
   {0, 0, 0, 0}
@@ -354,7 +352,6 @@ int match_icase;
 int match_words;
 int match_lines;
 unsigned char eolbyte;
-int warnings;

 /* For error messages. */
 /* The name the program was run with, stripped of any leading path. */
@@ -388,27 +385,6 @@ static enum
     SKIP_DEVICES
   } devices = READ_DEVICES;

-enum warn_type
-  {
-    WARN_ALWAYS = 0,
-    WARN_NEVER,
-    WARN_AUTO,
-  };
-
-static char const *const warn_args[] =
-{
-  "always", "yes", "force",
-  "never", "no", "none",
-  "auto", "tty", "if-tty", NULL
-};
-static enum warn_type const warn_types[] =
-{
-  WARN_ALWAYS, WARN_ALWAYS, WARN_ALWAYS,
-  WARN_NEVER, WARN_NEVER, WARN_NEVER,
-  WARN_AUTO, WARN_AUTO, WARN_AUTO
-};
-ARGMATCH_VERIFY (warn_args, warn_types);
-
 static int grepdir (char const *, struct stats const *);
 #if defined HAVE_DOS_FILE_CONTENTS
 static inline int undossify_input (char *, size_t);
@@ -1783,7 +1759,6 @@ main (int argc, char **argv)
   int opt, cc, status;
   int default_context;
   FILE *fp;
-  enum warn_type w;

   initialize_main (&argc, &argv);
   set_program_name (argv[0]);
@@ -1816,8 +1791,6 @@ main (int argc, char **argv)
   exit_failure = EXIT_TROUBLE;
   atexit (close_stdout);

-  w = getenv ("POSIXLY_CORRECT") ? WARN_NEVER : WARN_ALWAYS;
-
   prepend_default_options (getenv ("GREP_OPTIONS"), &argc, &argv);
   setmatcher (NULL);

@@ -2051,6 +2024,15 @@ main (int argc, char **argv)
             show_help = 1;
         } else
           color_option = 2;
+        if (color_option == 2)
+          {
+            char const *t;
+            if (isatty (STDOUT_FILENO) && (t = getenv ("TERM"))
+                && !STREQ (t, "dumb"))
+              color_option = 1;
+            else
+              color_option = 0;
+          }
         break;

       case EXCLUDE_OPTION:
@@ -2098,13 +2080,6 @@ main (int argc, char **argv)
         /* long options */
         break;

-      case WARN_OPTION:
-        if (optarg)
-          w = XARGMATCH ("--warn", optarg, warn_args, warn_types);
-        else
-          w = WARN_ALWAYS;
-        break;
-
       default:
         usage (EXIT_TROUBLE);
         break;
@@ -2127,28 +2102,6 @@ main (int argc, char **argv)
   if (out_before < 0)
     out_before = default_context;

-  char const *t;
-  if (isatty (STDOUT_FILENO)
-      && (t = getenv ("TERM")) && !STREQ (t, "dumb"))
-    {
-      if (color_option == 2)
-        color_option = 1;
-      if (w == WARN_AUTO)
-        w = WARN_ALWAYS;
-    }
-  else
-    {
-      if (color_option == 2)
-        color_option = 0;
-
-      /* Redirected stdout: we're likely in a script, disable warnings.  */
-      if (w == WARN_AUTO)
-        w = WARN_NEVER;
-    }
-
-  /* assert (w == WARN_ALWAYS || w == WARN_NEVER); */
-  warnings = w == WARN_ALWAYS;
-
   if (color_option)
     {
       /* Legacy.  */
diff --git a/tests/warn-char-classes b/tests/warn-char-classes
index 069489f..25bf640 100644
--- a/tests/warn-char-classes
+++ b/tests/warn-char-classes
@@ -7,12 +7,14 @@ set -x
 echo f > x || framework_failure_
 echo b >> x || framework_failure_
 echo h >> x || framework_failure_
+printf 'grep: character class syntax is [[:space:]], not [:space:]\n' \
+  > exp-err || framework_failure_

 # basic cases

 grep '[:space:]' x 2> err
-test $? = 1 || fail=1
-test -s err || fail=1
+test $? = 2 || fail=1
+compare err exp-err || fail=1

 grep '[[:space:]]' x 2> err
 test $? = 1 || fail=1
--
1.7.2.2.510.g7180a


>From 85e0be54275366ade35333a69d5ba23064e5078f Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Wed, 1 Sep 2010 19:02:38 +0200
Subject: [PATCH 2/2] maint: add lib/version-etc.c to the list in POTFILES.in

* po/POTFILES.in: Add lib/version-etc.c.
---
 po/POTFILES.in |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/po/POTFILES.in b/po/POTFILES.in
index ff8b148..15ded0c 100644
--- a/po/POTFILES.in
+++ b/po/POTFILES.in
@@ -22,6 +22,7 @@ lib/getopt.c
 lib/obstack.c
 lib/quotearg.c
 lib/regcomp.c
+lib/version-etc.c
 lib/xalloc-die.c
 lib/xstrtol-error.c
 src/dfa.c
--
1.7.2.2.510.g7180a



reply via email to

[Prev in Thread] Current Thread [Next in Thread]