grep-commit
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

grep branch, master, updated. v3.7-70-gc831ffa


From: Paul Eggert
Subject: grep branch, master, updated. v3.7-70-gc831ffa
Date: Sat, 21 May 2022 05:41:27 -0400 (EDT)

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "grep".

The branch, master has been updated
       via  c831ffa1d9a2399e6e4ff44d2bf3825c324812fa (commit)
       via  a368a60eb81ea6e3264e0c8c2cb12f2ee7f0585d (commit)
       via  2169fa36c9235d13bf64e20009fc3a639ca5670a (commit)
      from  e24ab83682f2124b7c8fa59ab05b250e1f4dae94 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
http://git.savannah.gnu.org/cgit/grep.git/commit/?id=c831ffa1d9a2399e6e4ff44d2bf3825c324812fa


commit c831ffa1d9a2399e6e4ff44d2bf3825c324812fa
Author: Paul Eggert <eggert@cs.ucla.edu>
Date:   Sat May 21 02:34:49 2022 -0700

    doc: document regex corner cases better
    
    * doc/grep.texi (Environment Variables)
    (Fundamental Structure, Character Classes and Bracket Expressions)
    (The Backslash Character and Special Expressions)
    (Back-references and Subexpressions, Basic vs Extended)
    (Basic vs Extended): Say more precisely what happens with oddball
    regular expressions.

diff --git a/doc/grep.texi b/doc/grep.texi
index 71e19e0..a717e32 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -1013,7 +1013,7 @@ They are omitted (i.e., false) by default and become true 
when specified.
 @cindex national language support
 @cindex NLS
 These variables specify the locale for the @env{LC_COLLATE} category,
-which might affect how range expressions like @samp{[a-z]} are
+which might affect how range expressions like @samp{a-z} are
 interpreted.
 
 @item LC_ALL
@@ -1269,6 +1269,15 @@ A whole expression may be enclosed in parentheses
 to override these precedence rules and form a subexpression.
 An unmatched @samp{)} matches just itself.
 
+Some strings are not valid regular expressions and cause
+@command{grep} to issue a diagnostic and fail.  For example, @samp{xy\1}
+is invalid because there is no parenthesized subexpression for the
+back-reference @samp{\1} to refer to.  Also, some regular expressions
+have unspecified behavior and should be avoided in portable scripts
+even if @command{grep} does not currently diagnose them.  For example,
+@samp{xy\0} has unspecified behavior because @samp{0} is not a special
+character and there is no documentation for the behavior of @samp{\0}.
+
 @node Character Classes and Bracket Expressions
 @section Character Classes and Bracket Expressions
 
@@ -1296,7 +1305,7 @@ order; for example, @samp{[a-d]} is equivalent to 
@samp{[abcd]}.
 In other locales, the sorting sequence is not specified, and
 @samp{[a-d]} might be equivalent to @samp{[abcd]} or to
 @samp{[aBbCcDd]}, or it might fail to match any character, or the set of
-characters that it matches might even be erratic.
+characters that it matches might be erratic, or it might be invalid.
 To obtain the traditional interpretation
 of bracket expressions, you can use the @samp{C} locale by setting the
 @env{LC_ALL} environment variable to the value @samp{C}.
@@ -1483,6 +1492,13 @@ Match non-whitespace, it is a synonym for 
@samp{[^[:space:]]}.
 For example, @samp{\brat\b} matches the separate word @samp{rat},
 @samp{\Brat\B} matches @samp{crate} but not @samp{furry rat}.
 
+The behavior of @command{grep} is unspecified if a unescaped backslash
+is not followed by a special character, a nonzero digit, or a
+character in the above list.  Although @command{grep} might issue a
+diagnostic and/or give the backslash an interpretation now, its
+behavior may change if the syntax of regular expressions is extended
+in future versions.
+
 @node Anchoring
 @section Anchoring
 @cindex anchoring
@@ -1508,6 +1524,8 @@ for example, @samp{(a)*\1} fails to match @samp{a}.
 If the parenthesized subexpression matches more than one substring,
 the back-reference refers to the last matched substring;
 for example, @samp{^(ab*)*\1$} matches @samp{ababbabb} but not @samp{ababbab}.
+The back-reference @samp{\@var{n}} is invalid
+if preceded by fewer than @var{n} subexpressions.
 When multiple regular expressions are given with
 @option{-e} or from a file (@samp{-f @var{file}}),
 back-references are local to each expression.
@@ -1530,26 +1548,43 @@ POSIX says they produce unspecified results:
 
 @itemize @bullet
 @item
-Extended regular expressions that use back-references.
+An extended regular expression that uses back-references.
+@item
+A basic regular expression that uses @samp{\?}, @samp{\+}, or @samp{\|}.
+@item
+An empty parenthesized regular expression like @samp{()}.
 @item
-Basic regular expressions that use @samp{\?}, @samp{\+}, or @samp{\|}.
+An empty alternative (as in, e.g, @samp{a|}).
 @item
-Empty parenthesized regular expressions like @samp{()}.
+A repetition operator that immediately follows an empty expression,
+unescaped @samp{$}, or another repetition operator.
 @item
-Empty alternatives (as in, e.g, @samp{a|}).
+An interval expression with a repetition count greater than 255.
 @item
-Repetition operators that immediately follow empty expressions,
-unescaped @samp{$}, or other repetition operators.
+A basic regular expression with unbalanced @samp{\(} or @samp{\)},
+or an extended regular expression with unbalanced @samp{(}.
 @item
-Interval expressions containing repetition counts greater than 255.
+A bracket expression that contains at least three elements, the first
+and last of which are both @samp{:}, or both @samp{.}, or both
+@samp{=}.  For example, it is unspecified whether the bracket expression
+@samp{[:alpha:]} is equivalent to @samp{[[:alpha:]]}, equivalent to
+@samp{[:ahlp]}, or invalid.
+@item
+A range expression like @samp{z-a} that represents zero elements;
+it might never match, or it might be invalid.
+@item
+A range expression outside the POSIX locale.
 @item
 A backslash escaping an ordinary character (e.g., @samp{\S}),
 unless it is a back-reference.
 @item
+An unescaped backslash at the end of a regular expression.
+@item
 An unescaped @samp{[} that is not part of a bracket expression.
 @item
-In extended regular expressions, an unescaped @samp{@{} that is not
-part of an interval expression.
+A @samp{\@{} in a basic regular expression (or an unescaped @samp{@{}
+in an extended regular expression) that does not start an interval
+expression.
 @end itemize
 
 @cindex interval expressions

http://git.savannah.gnu.org/cgit/grep.git/commit/?id=a368a60eb81ea6e3264e0c8c2cb12f2ee7f0585d


commit c831ffa1d9a2399e6e4ff44d2bf3825c324812fa
Author: Paul Eggert <eggert@cs.ucla.edu>
Date:   Sat May 21 02:34:49 2022 -0700

    doc: document regex corner cases better
    
    * doc/grep.texi (Environment Variables)
    (Fundamental Structure, Character Classes and Bracket Expressions)
    (The Backslash Character and Special Expressions)
    (Back-references and Subexpressions, Basic vs Extended)
    (Basic vs Extended): Say more precisely what happens with oddball
    regular expressions.

diff --git a/doc/grep.texi b/doc/grep.texi
index 71e19e0..a717e32 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -1013,7 +1013,7 @@ They are omitted (i.e., false) by default and become true 
when specified.
 @cindex national language support
 @cindex NLS
 These variables specify the locale for the @env{LC_COLLATE} category,
-which might affect how range expressions like @samp{[a-z]} are
+which might affect how range expressions like @samp{a-z} are
 interpreted.
 
 @item LC_ALL
@@ -1269,6 +1269,15 @@ A whole expression may be enclosed in parentheses
 to override these precedence rules and form a subexpression.
 An unmatched @samp{)} matches just itself.
 
+Some strings are not valid regular expressions and cause
+@command{grep} to issue a diagnostic and fail.  For example, @samp{xy\1}
+is invalid because there is no parenthesized subexpression for the
+back-reference @samp{\1} to refer to.  Also, some regular expressions
+have unspecified behavior and should be avoided in portable scripts
+even if @command{grep} does not currently diagnose them.  For example,
+@samp{xy\0} has unspecified behavior because @samp{0} is not a special
+character and there is no documentation for the behavior of @samp{\0}.
+
 @node Character Classes and Bracket Expressions
 @section Character Classes and Bracket Expressions
 
@@ -1296,7 +1305,7 @@ order; for example, @samp{[a-d]} is equivalent to 
@samp{[abcd]}.
 In other locales, the sorting sequence is not specified, and
 @samp{[a-d]} might be equivalent to @samp{[abcd]} or to
 @samp{[aBbCcDd]}, or it might fail to match any character, or the set of
-characters that it matches might even be erratic.
+characters that it matches might be erratic, or it might be invalid.
 To obtain the traditional interpretation
 of bracket expressions, you can use the @samp{C} locale by setting the
 @env{LC_ALL} environment variable to the value @samp{C}.
@@ -1483,6 +1492,13 @@ Match non-whitespace, it is a synonym for 
@samp{[^[:space:]]}.
 For example, @samp{\brat\b} matches the separate word @samp{rat},
 @samp{\Brat\B} matches @samp{crate} but not @samp{furry rat}.
 
+The behavior of @command{grep} is unspecified if a unescaped backslash
+is not followed by a special character, a nonzero digit, or a
+character in the above list.  Although @command{grep} might issue a
+diagnostic and/or give the backslash an interpretation now, its
+behavior may change if the syntax of regular expressions is extended
+in future versions.
+
 @node Anchoring
 @section Anchoring
 @cindex anchoring
@@ -1508,6 +1524,8 @@ for example, @samp{(a)*\1} fails to match @samp{a}.
 If the parenthesized subexpression matches more than one substring,
 the back-reference refers to the last matched substring;
 for example, @samp{^(ab*)*\1$} matches @samp{ababbabb} but not @samp{ababbab}.
+The back-reference @samp{\@var{n}} is invalid
+if preceded by fewer than @var{n} subexpressions.
 When multiple regular expressions are given with
 @option{-e} or from a file (@samp{-f @var{file}}),
 back-references are local to each expression.
@@ -1530,26 +1548,43 @@ POSIX says they produce unspecified results:
 
 @itemize @bullet
 @item
-Extended regular expressions that use back-references.
+An extended regular expression that uses back-references.
+@item
+A basic regular expression that uses @samp{\?}, @samp{\+}, or @samp{\|}.
+@item
+An empty parenthesized regular expression like @samp{()}.
 @item
-Basic regular expressions that use @samp{\?}, @samp{\+}, or @samp{\|}.
+An empty alternative (as in, e.g, @samp{a|}).
 @item
-Empty parenthesized regular expressions like @samp{()}.
+A repetition operator that immediately follows an empty expression,
+unescaped @samp{$}, or another repetition operator.
 @item
-Empty alternatives (as in, e.g, @samp{a|}).
+An interval expression with a repetition count greater than 255.
 @item
-Repetition operators that immediately follow empty expressions,
-unescaped @samp{$}, or other repetition operators.
+A basic regular expression with unbalanced @samp{\(} or @samp{\)},
+or an extended regular expression with unbalanced @samp{(}.
 @item
-Interval expressions containing repetition counts greater than 255.
+A bracket expression that contains at least three elements, the first
+and last of which are both @samp{:}, or both @samp{.}, or both
+@samp{=}.  For example, it is unspecified whether the bracket expression
+@samp{[:alpha:]} is equivalent to @samp{[[:alpha:]]}, equivalent to
+@samp{[:ahlp]}, or invalid.
+@item
+A range expression like @samp{z-a} that represents zero elements;
+it might never match, or it might be invalid.
+@item
+A range expression outside the POSIX locale.
 @item
 A backslash escaping an ordinary character (e.g., @samp{\S}),
 unless it is a back-reference.
 @item
+An unescaped backslash at the end of a regular expression.
+@item
 An unescaped @samp{[} that is not part of a bracket expression.
 @item
-In extended regular expressions, an unescaped @samp{@{} that is not
-part of an interval expression.
+A @samp{\@{} in a basic regular expression (or an unescaped @samp{@{}
+in an extended regular expression) that does not start an interval
+expression.
 @end itemize
 
 @cindex interval expressions

http://git.savannah.gnu.org/cgit/grep.git/commit/?id=2169fa36c9235d13bf64e20009fc3a639ca5670a


commit c831ffa1d9a2399e6e4ff44d2bf3825c324812fa
Author: Paul Eggert <eggert@cs.ucla.edu>
Date:   Sat May 21 02:34:49 2022 -0700

    doc: document regex corner cases better
    
    * doc/grep.texi (Environment Variables)
    (Fundamental Structure, Character Classes and Bracket Expressions)
    (The Backslash Character and Special Expressions)
    (Back-references and Subexpressions, Basic vs Extended)
    (Basic vs Extended): Say more precisely what happens with oddball
    regular expressions.

diff --git a/doc/grep.texi b/doc/grep.texi
index 71e19e0..a717e32 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -1013,7 +1013,7 @@ They are omitted (i.e., false) by default and become true 
when specified.
 @cindex national language support
 @cindex NLS
 These variables specify the locale for the @env{LC_COLLATE} category,
-which might affect how range expressions like @samp{[a-z]} are
+which might affect how range expressions like @samp{a-z} are
 interpreted.
 
 @item LC_ALL
@@ -1269,6 +1269,15 @@ A whole expression may be enclosed in parentheses
 to override these precedence rules and form a subexpression.
 An unmatched @samp{)} matches just itself.
 
+Some strings are not valid regular expressions and cause
+@command{grep} to issue a diagnostic and fail.  For example, @samp{xy\1}
+is invalid because there is no parenthesized subexpression for the
+back-reference @samp{\1} to refer to.  Also, some regular expressions
+have unspecified behavior and should be avoided in portable scripts
+even if @command{grep} does not currently diagnose them.  For example,
+@samp{xy\0} has unspecified behavior because @samp{0} is not a special
+character and there is no documentation for the behavior of @samp{\0}.
+
 @node Character Classes and Bracket Expressions
 @section Character Classes and Bracket Expressions
 
@@ -1296,7 +1305,7 @@ order; for example, @samp{[a-d]} is equivalent to 
@samp{[abcd]}.
 In other locales, the sorting sequence is not specified, and
 @samp{[a-d]} might be equivalent to @samp{[abcd]} or to
 @samp{[aBbCcDd]}, or it might fail to match any character, or the set of
-characters that it matches might even be erratic.
+characters that it matches might be erratic, or it might be invalid.
 To obtain the traditional interpretation
 of bracket expressions, you can use the @samp{C} locale by setting the
 @env{LC_ALL} environment variable to the value @samp{C}.
@@ -1483,6 +1492,13 @@ Match non-whitespace, it is a synonym for 
@samp{[^[:space:]]}.
 For example, @samp{\brat\b} matches the separate word @samp{rat},
 @samp{\Brat\B} matches @samp{crate} but not @samp{furry rat}.
 
+The behavior of @command{grep} is unspecified if a unescaped backslash
+is not followed by a special character, a nonzero digit, or a
+character in the above list.  Although @command{grep} might issue a
+diagnostic and/or give the backslash an interpretation now, its
+behavior may change if the syntax of regular expressions is extended
+in future versions.
+
 @node Anchoring
 @section Anchoring
 @cindex anchoring
@@ -1508,6 +1524,8 @@ for example, @samp{(a)*\1} fails to match @samp{a}.
 If the parenthesized subexpression matches more than one substring,
 the back-reference refers to the last matched substring;
 for example, @samp{^(ab*)*\1$} matches @samp{ababbabb} but not @samp{ababbab}.
+The back-reference @samp{\@var{n}} is invalid
+if preceded by fewer than @var{n} subexpressions.
 When multiple regular expressions are given with
 @option{-e} or from a file (@samp{-f @var{file}}),
 back-references are local to each expression.
@@ -1530,26 +1548,43 @@ POSIX says they produce unspecified results:
 
 @itemize @bullet
 @item
-Extended regular expressions that use back-references.
+An extended regular expression that uses back-references.
+@item
+A basic regular expression that uses @samp{\?}, @samp{\+}, or @samp{\|}.
+@item
+An empty parenthesized regular expression like @samp{()}.
 @item
-Basic regular expressions that use @samp{\?}, @samp{\+}, or @samp{\|}.
+An empty alternative (as in, e.g, @samp{a|}).
 @item
-Empty parenthesized regular expressions like @samp{()}.
+A repetition operator that immediately follows an empty expression,
+unescaped @samp{$}, or another repetition operator.
 @item
-Empty alternatives (as in, e.g, @samp{a|}).
+An interval expression with a repetition count greater than 255.
 @item
-Repetition operators that immediately follow empty expressions,
-unescaped @samp{$}, or other repetition operators.
+A basic regular expression with unbalanced @samp{\(} or @samp{\)},
+or an extended regular expression with unbalanced @samp{(}.
 @item
-Interval expressions containing repetition counts greater than 255.
+A bracket expression that contains at least three elements, the first
+and last of which are both @samp{:}, or both @samp{.}, or both
+@samp{=}.  For example, it is unspecified whether the bracket expression
+@samp{[:alpha:]} is equivalent to @samp{[[:alpha:]]}, equivalent to
+@samp{[:ahlp]}, or invalid.
+@item
+A range expression like @samp{z-a} that represents zero elements;
+it might never match, or it might be invalid.
+@item
+A range expression outside the POSIX locale.
 @item
 A backslash escaping an ordinary character (e.g., @samp{\S}),
 unless it is a back-reference.
 @item
+An unescaped backslash at the end of a regular expression.
+@item
 An unescaped @samp{[} that is not part of a bracket expression.
 @item
-In extended regular expressions, an unescaped @samp{@{} that is not
-part of an interval expression.
+A @samp{\@{} in a basic regular expression (or an unescaped @samp{@{}
+in an extended regular expression) that does not start an interval
+expression.
 @end itemize
 
 @cindex interval expressions

-----------------------------------------------------------------------

Summary of changes:
 NEWS                    |  3 +++
 doc/grep.texi           | 65 ++++++++++++++++++++++++++++++++++++-------------
 src/dfasearch.c         |  8 ++----
 tests/spencer1.tests    |  4 ++-
 tests/warn-char-classes |  4 ---
 5 files changed, 56 insertions(+), 28 deletions(-)


hooks/post-receive
-- 
grep



reply via email to

[Prev in Thread] Current Thread [Next in Thread]