grep-commit
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Changes to grep/manual/grep.html,v


From: Jim Meyering
Subject: Changes to grep/manual/grep.html,v
Date: Sat, 3 Sep 2022 15:33:15 -0400 (EDT)

CVSROOT:        /webcvs/grep
Module name:    grep
Changes by:     Jim Meyering <meyering> 22/09/03 15:33:15

Index: grep.html
===================================================================
RCS file: /webcvs/grep/grep/manual/grep.html,v
retrieving revision 1.32
retrieving revision 1.33
diff -u -b -r1.32 -r1.33
--- grep.html   14 Aug 2021 20:46:39 -0000      1.32
+++ grep.html   3 Sep 2022 19:33:13 -0000       1.33
@@ -5,7 +5,7 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
 <!-- This manual is for grep, a pattern matching engine.
 
-Copyright (C) 1999-2002, 2005, 2008-2021 Free Software Foundation,
+Copyright (C) 1999-2002, 2005, 2008-2022 Free Software Foundation,
 Inc.
 
 Permission is granted to copy, distribute and/or modify this document
@@ -14,10 +14,10 @@
 Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
 Texts.  A copy of the license is included in the section entitled
 "GNU Free Documentation License". -->
-<title>GNU Grep 3.7</title>
+<title>GNU Grep 3.8</title>
 
-<meta name="description" content="GNU Grep 3.7">
-<meta name="keywords" content="GNU Grep 3.7">
+<meta name="description" content="GNU Grep 3.8">
+<meta name="keywords" content="GNU Grep 3.8">
 <meta name="resource-type" content="document">
 <meta name="distribution" content="global">
 <meta name="Generator" content="makeinfo">
@@ -53,7 +53,7 @@
 </head>
 
 <body lang="en">
-<h1 class="settitle" align="center">GNU Grep 3.7</h1>
+<h1 class="settitle" align="center">GNU Grep 3.8</h1>
 
 
 
@@ -71,11 +71,11 @@
 
 <p><code>grep</code> prints lines that contain a match for one or more 
patterns.
 </p>
-<p>This manual is for version 3.7 of GNU Grep.
+<p>This manual is for version 3.8 of GNU Grep.
 </p>
 <p>This manual is for <code>grep</code>, a pattern matching engine.
 </p>
-<p>Copyright &copy; 1999&ndash;2002, 2005, 2008&ndash;2021 Free Software 
Foundation,
+<p>Copyright &copy; 1999&ndash;2002, 2005, 2008&ndash;2022 Free Software 
Foundation,
 Inc.
 </p>
 <blockquote>
@@ -116,12 +116,13 @@
   <ul class="no-bullet">
     <li><a id="toc-Fundamental-Structure-1" href="#Fundamental-Structure">3.1 
Fundamental Structure</a></li>
     <li><a id="toc-Character-Classes-and-Bracket-Expressions-1" 
href="#Character-Classes-and-Bracket-Expressions">3.2 Character Classes and 
Bracket Expressions</a></li>
-    <li><a id="toc-The-Backslash-Character-and-Special-Expressions-1" 
href="#The-Backslash-Character-and-Special-Expressions">3.3 The Backslash 
Character and Special Expressions</a></li>
+    <li><a id="toc-Special-Backslash-Expressions-1" 
href="#Special-Backslash-Expressions">3.3 Special Backslash Expressions</a></li>
     <li><a id="toc-Anchoring-1" href="#Anchoring">3.4 Anchoring</a></li>
     <li><a id="toc-Back_002dreferences-and-Subexpressions-1" 
href="#Back_002dreferences-and-Subexpressions">3.5 Back-references and 
Subexpressions</a></li>
     <li><a id="toc-Basic-vs-Extended-Regular-Expressions" 
href="#Basic-vs-Extended">3.6 Basic vs Extended Regular Expressions</a></li>
-    <li><a id="toc-Character-Encoding-1" href="#Character-Encoding">3.7 
Character Encoding</a></li>
-    <li><a id="toc-Matching-Non_002dASCII-and-Non_002dprintable-Characters" 
href="#Matching-Non_002dASCII">3.8 Matching Non-ASCII and Non-printable 
Characters</a></li>
+    <li><a id="toc-Problematic-Regular-Expressions" 
href="#Problematic-Expressions">3.7 Problematic Regular Expressions</a></li>
+    <li><a id="toc-Character-Encoding-1" href="#Character-Encoding">3.8 
Character Encoding</a></li>
+    <li><a id="toc-Matching-Non_002dASCII-and-Non_002dprintable-Characters" 
href="#Matching-Non_002dASCII">3.9 Matching Non-ASCII and Non-printable 
Characters</a></li>
   </ul></li>
   <li><a id="toc-Usage-1" href="#Usage">4 Usage</a></li>
   <li><a id="toc-Performance-1" href="#Performance">5 Performance</a></li>
@@ -342,7 +343,7 @@
 regular expression with &lsquo;<samp>\&lt;</samp>&rsquo; and 
&lsquo;<samp>\&gt;</samp>&rsquo;.  For example, although
 &lsquo;<samp>grep -w @</samp>&rsquo; matches a line containing only 
&lsquo;<samp>@</samp>&rsquo;, &lsquo;<samp>grep
 '\&lt;@\&gt;'</samp>&rsquo; cannot match any line because 
&lsquo;<samp>@</samp>&rsquo; is not a
-word constituent.  See <a 
href="#The-Backslash-Character-and-Special-Expressions">The Backslash Character 
and Special Expressions</a>.
+word constituent.  See <a href="#Special-Backslash-Expressions">Special 
Backslash Expressions</a>.
 </p>
 </dd>
 <dt id='index-_002dx'><span><samp>-x</samp><a href='#index-_002dx' 
class='copiable-anchor'> &para;</a></span></dt>
@@ -382,7 +383,7 @@
 <dt><span><samp>--colour[=<var>WHEN</var>]</samp></span></dt>
 <dd><span id="index-_002d_002dcolour"></span>
 <span id="index-highlight_002c-color_002c-colour"></span>
-<p>Surround the matched (non-empty) strings, matching lines, context lines,
+<p>Surround matched non-empty strings, matching lines, context lines,
 file names, line numbers, byte offsets, and separators (for fields and
 groups of context lines) with escape sequences to display them in color
 on the terminal.
@@ -390,11 +391,14 @@
 and default to 
&lsquo;<samp>ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36</samp>&rsquo;
 for bold red matched text, magenta file names, green line numbers,
 green byte offsets, cyan separators, and default terminal colors otherwise.
-The deprecated environment variable <code>GREP_COLOR</code> is still supported,
-but its setting does not have priority;
-it defaults to &lsquo;<samp>01;31</samp>&rsquo; (bold red)
-which only covers the color for matched text.
-<var>WHEN</var> is &lsquo;<samp>never</samp>&rsquo;, 
&lsquo;<samp>always</samp>&rsquo;, or &lsquo;<samp>auto</samp>&rsquo;.
+See <a href="#Environment-Variables">Environment Variables</a>.
+</p>
+<p><var>WHEN</var> is &lsquo;<samp>always</samp>&rsquo; to use colors, 
&lsquo;<samp>never</samp>&rsquo; to not use
+colors, or &lsquo;<samp>auto</samp>&rsquo; to use colors if standard output is 
associated
+with a terminal device and the <code>TERM</code> environment variable&rsquo;s 
value
+suggests that the terminal supports colors.
+Plain <samp>--color</samp> is treated like <samp>--color=auto</samp>;
+if no <samp>--color</samp> option is given, the default is 
<samp>--color=never</samp>.
 </p>
 </dd>
 <dt id='index-_002dL'><span><samp>-L</samp><a href='#index-_002dL' 
class='copiable-anchor'> &para;</a></span></dt>
@@ -422,7 +426,11 @@
 <dd><span id="index-_002d_002dmax_002dcount"></span>
 <span id="index-max_002dcount"></span>
 <p>Stop after the first <var>num</var> selected lines.
-If the input is standard input from a regular file,
+If <var>num</var> is zero, <code>grep</code> stops right away without reading 
input.
+A <var>num</var> of -1 is treated as infinity and <code>grep</code>
+does not stop; this is the default.
+</p>
+<p>If the input is standard input from a regular file,
 and <var>num</var> selected lines are output,
 <code>grep</code> ensures that the standard input is positioned
 just after the last selected line before exiting,
@@ -462,7 +470,7 @@
 <dt><span><samp>--only-matching</samp></span></dt>
 <dd><span id="index-_002d_002donly_002dmatching"></span>
 <span id="index-only-matching"></span>
-<p>Print only the matched (non-empty) parts of matching lines,
+<p>Print only the matched non-empty parts of matching lines,
 with each such part on a separate output line.
 Output lines use the same delimiters as input, and delimiters are null
 bytes if <samp>-z</samp> (<samp>--null-data</samp>) is also used (see <a 
href="#Other-Options">Other Options</a>).
@@ -478,6 +486,9 @@
 Exit immediately with zero status if any match is found,
 even if an error was detected.
 Also see the <samp>-s</samp> or <samp>--no-messages</samp> option.
+Portability note: Solaris 10 <code>grep</code> lacks <samp>-q</samp>;
+portable shell scripts typically can redirect standard output to
+<samp>/dev/null</samp> instead of using <samp>-q</samp>.
 (<samp>-q</samp> is specified by POSIX.)
 </p>
 </dd>
@@ -486,17 +497,6 @@
 <dd><span id="index-_002d_002dno_002dmessages"></span>
 <span id="index-suppress-error-messages"></span>
 <p>Suppress error messages about nonexistent or unreadable files.
-Portability note:
-unlike GNU <code>grep</code>,
-7th Edition Unix <code>grep</code> did not conform to POSIX,
-because it lacked <samp>-q</samp>
-and its <samp>-s</samp> option behaved like
-GNU <code>grep</code>&rsquo;s <samp>-q</samp> option.<a id="DOCF1" 
href="#FOOT1"><sup>1</sup></a>
-USG-style <code>grep</code> also lacked <samp>-q</samp>
-but its <samp>-s</samp> option behaved like GNU <code>grep</code>&rsquo;s.
-Portable shell scripts should avoid both
-<samp>-q</samp> and <samp>-s</samp> and should redirect
-standard and error output to <samp>/dev/null</samp> instead.
 (<samp>-s</samp> is specified by POSIX.)
 </p>
 </dd>
@@ -710,7 +710,7 @@
 suppresses output after null input binary data is discovered,
 and suppresses output lines that contain improperly encoded data.
 When some output is suppressed, <code>grep</code> follows any output
-with a one-line message saying that a binary file matches.
+with a message to standard error saying that a binary file matches.
 </p>
 <p>If <var>type</var> is &lsquo;<samp>without-match</samp>&rsquo;,
 when <code>grep</code> discovers null input binary data
@@ -922,8 +922,10 @@
 </div>
 <span id="Environment-Variables-1"></span><h3 class="section">2.2 Environment 
Variables</h3>
 
-<p>The behavior of <code>grep</code> is affected
-by the following environment variables.
+<p>The behavior of <code>grep</code> is affected by several environment
+variables, the most important of which control the locale, which
+specifies how <code>grep</code> interprets characters in its patterns and
+data.
 </p>
 <span id="index-LANGUAGE-environment-variable"></span>
 <span id="index-LC_005fALL-environment-variable"></span>
@@ -935,8 +937,8 @@
 in that order.
 The first of these variables that is set specifies the locale.
 For example, if <code>LC_ALL</code> is not set,
-but <code>LC_COLLATE</code> is set to &lsquo;<samp>pt_BR</samp>&rsquo;,
-then the Brazilian Portuguese locale is used
+but <code>LC_COLLATE</code> is set to &lsquo;<samp>pt_BR.UTF-8</samp>&rsquo;,
+then a Brazilian Portuguese locale is used
 for the <code>LC_COLLATE</code> category.
 As a special case for <code>LC_MESSAGES</code> only, the environment variable
 <code>LANGUAGE</code> can contain a colon-separated list of languages that
@@ -948,7 +950,32 @@
 with national language support (NLS).
 The shell command <code>locale -a</code> lists locales that are currently 
available.
 </p>
-<p>Many of the environment variables in the following list let you
+<span id="index-environment-variables"></span>
+<p>The following environment variables affect the behavior of 
<code>grep</code>.
+</p>
+<dl compact="compact">
+<dt 
id='index-GREP_005fCOLOR-environment-variable'><span><code>GREP_COLOR</code><a 
href='#index-GREP_005fCOLOR-environment-variable' class='copiable-anchor'> 
&para;</a></span></dt>
+<dd><span id="index-highlight-markers"></span>
+<p>This obsolescent variable interacts with <code>GREP_COLORS</code>
+confusingly, and <code>grep</code> warns if it is set and is not
+overridden by <code>GREP_COLORS</code>.  Instead of
+&lsquo;<samp>GREP_COLOR='<var>color</var>'</samp>&rsquo;, you can use
+&lsquo;<samp>GREP_COLORS='mt=<var>color</var>'</samp>&rsquo;.
+</p>
+</dd>
+<dt 
id='index-GREP_005fCOLORS-environment-variable'><span><code>GREP_COLORS</code><a
 href='#index-GREP_005fCOLORS-environment-variable' class='copiable-anchor'> 
&para;</a></span></dt>
+<dd><span id="index-highlight-markers-1"></span>
+<p>This variable specifies the colors and other attributes
+used to highlight various parts of the output.
+Its value is a colon-separated list of <code>terminfo</code> capabilities
+that defaults to 
&lsquo;<samp>ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36</samp>&rsquo;
+with the &lsquo;<samp>rv</samp>&rsquo; and &lsquo;<samp>ne</samp>&rsquo; 
boolean capabilities omitted (i.e., false).
+The two-letter capability names
+refer to terminal &ldquo;capabilities,&rdquo; the ability
+of a terminal to highlight text, or change its color, and so on.
+These capabilities are stored in an online database and accessed by
+the <code>terminfo</code> library.
+Non-empty capability values
 control highlighting using
 Select Graphic Rendition (SGR)
 commands interpreted by the terminal or terminal emulator.
@@ -976,37 +1003,7 @@
 and &lsquo;<samp>48;5;0</samp>&rsquo; to &lsquo;<samp>48;5;255</samp>&rsquo;
 for 88-color and 256-color modes background colors.
 </p>
-<p>The two-letter names used in the <code>GREP_COLORS</code> environment 
variable
-(and some of the others) refer to terminal &ldquo;capabilities,&rdquo; the 
ability
-of a terminal to highlight text, or change its color, and so on.
-These capabilities are stored in an online database and accessed by
-the <code>terminfo</code> library.
-</p>
-<span id="index-environment-variables"></span>
-
-<dl compact="compact">
-<dt 
id='index-GREP_005fCOLOR-environment-variable'><span><code>GREP_COLOR</code><a 
href='#index-GREP_005fCOLOR-environment-variable' class='copiable-anchor'> 
&para;</a></span></dt>
-<dd><span id="index-highlight-markers"></span>
-<p>This variable specifies the color used to highlight matched (non-empty) 
text.
-It is deprecated in favor of <code>GREP_COLORS</code>, but still supported.
-The &lsquo;<samp>mt</samp>&rsquo;, &lsquo;<samp>ms</samp>&rsquo;, and 
&lsquo;<samp>mc</samp>&rsquo; capabilities of <code>GREP_COLORS</code>
-have priority over it.
-It can only specify the color used to highlight
-the matching non-empty text in any matching line
-(a selected line when the <samp>-v</samp> command-line option is omitted,
-or a context line when <samp>-v</samp> is specified).
-The default is &lsquo;<samp>01;31</samp>&rsquo;,
-which means a bold red foreground text on the terminal&rsquo;s default 
background.
-</p>
-</dd>
-<dt 
id='index-GREP_005fCOLORS-environment-variable'><span><code>GREP_COLORS</code><a
 href='#index-GREP_005fCOLORS-environment-variable' class='copiable-anchor'> 
&para;</a></span></dt>
-<dd><span id="index-highlight-markers-1"></span>
-<p>This variable specifies the colors and other attributes
-used to highlight various parts of the output.
-Its value is a colon-separated list of <code>terminfo</code> capabilities
-that defaults to 
&lsquo;<samp>ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36</samp>&rsquo;
-with the &lsquo;<samp>rv</samp>&rsquo; and &lsquo;<samp>ne</samp>&rsquo; 
boolean capabilities omitted (i.e., false).
-Supported capabilities are as follows.
+<p>Supported capabilities are as follows.
 </p>
 <dl compact="compact">
 <dt id='index-sl-GREP_005fCOLORS-capability'><span><code>sl=</code><a 
href='#index-sl-GREP_005fCOLORS-capability' class='copiable-anchor'> 
&para;</a></span></dt>
@@ -1116,7 +1113,7 @@
 <span id="index-national-language-support"></span>
 <span id="index-NLS"></span>
 <p>These variables specify the locale for the <code>LC_COLLATE</code> category,
-which might affect how range expressions like &lsquo;<samp>[a-z]</samp>&rsquo; 
are
+which might affect how range expressions like &lsquo;<samp>a-z</samp>&rsquo; 
are
 interpreted.
 </p>
 </dd>
@@ -1159,8 +1156,11 @@
 by default,
 such options are permuted to the front of the operand list
 and are treated as options.
-Also, <code>POSIXLY_CORRECT</code> disables special handling of an
-invalid bracket expression.  See <a 
href="#invalid_002dbracket_002dexpr">invalid-bracket-expr</a>.
+</p>
+</dd>
+<dt id='index-TERM-environment-variable'><span><code>TERM</code><a 
href='#index-TERM-environment-variable' class='copiable-anchor'> 
&para;</a></span></dt>
+<dd><p>This variable specifies the output terminal type, which can affect
+what the <samp>--color</samp> option does.  See <a 
href="#General-Output-Control">General Output Control</a>.
 </p>
 </dd>
 <dt 
id='index-_005fN_005fGNU_005fnonoption_005fargv_005fflags_005f-environment-variable'><span><code>_<var>N</var>_GNU_nonoption_argv_flags_</code><a
 
href='#index-_005fN_005fGNU_005fnonoption_005fargv_005fflags_005f-environment-variable'
 class='copiable-anchor'> &para;</a></span></dt>
@@ -1269,15 +1269,6 @@
 </dd>
 </dl>
 
-<p>In addition,
-two variant programs <code>egrep</code> and <code>fgrep</code> are available.
-<code>egrep</code> is the same as &lsquo;<samp>grep&nbsp;-E</samp>&rsquo;.
-<code>fgrep</code> is the same as &lsquo;<samp>grep&nbsp;-F</samp>&rsquo;.
-Direct invocation as either
-<code>egrep</code> or <code>fgrep</code> is deprecated,
-but is provided to allow historical applications
-that rely on them to run unmodified.
-</p>
 
 <hr>
 </div>
@@ -1297,25 +1288,26 @@
 three different versions of regular expression syntax:
 basic (BRE), extended (ERE), and Perl-compatible (PCRE).
 In GNU <code>grep</code>,
-there is no difference in available functionality between the basic and
-extended syntaxes.
+there is no difference in available functionality between basic and
+extended syntax.
 In other implementations, basic regular expressions are less powerful.
 The following description applies to extended regular expressions;
 differences for basic regular expressions are summarized afterwards.
 Perl-compatible regular expressions give additional functionality, and
-are documented in the <i>pcresyntax</i>(3) and <i>pcrepattern</i>(3) manual
+are documented in the <i>pcre2syntax</i>(3) and <i>pcre2pattern</i>(3) manual
 pages, but work only if PCRE is available in the system.
 </p>
 
 <ul class="section-toc">
 <li><a href="#Fundamental-Structure" accesskey="1">Fundamental 
Structure</a></li>
 <li><a href="#Character-Classes-and-Bracket-Expressions" 
accesskey="2">Character Classes and Bracket Expressions</a></li>
-<li><a href="#The-Backslash-Character-and-Special-Expressions" 
accesskey="3">The Backslash Character and Special Expressions</a></li>
+<li><a href="#Special-Backslash-Expressions" accesskey="3">Special Backslash 
Expressions</a></li>
 <li><a href="#Anchoring" accesskey="4">Anchoring</a></li>
 <li><a href="#Back_002dreferences-and-Subexpressions" 
accesskey="5">Back-references and Subexpressions</a></li>
 <li><a href="#Basic-vs-Extended" accesskey="6">Basic vs Extended Regular 
Expressions</a></li>
-<li><a href="#Character-Encoding" accesskey="7">Character Encoding</a></li>
-<li><a href="#Matching-Non_002dASCII" accesskey="8">Matching Non-ASCII and 
Non-printable Characters</a></li>
+<li><a href="#Problematic-Expressions" accesskey="7">Problematic Regular 
Expressions</a></li>
+<li><a href="#Character-Encoding" accesskey="8">Character Encoding</a></li>
+<li><a href="#Matching-Non_002dASCII" accesskey="9">Matching Non-ASCII and 
Non-printable Characters</a></li>
 </ul>
 <hr>
 <div class="section" id="Fundamental-Structure">
@@ -1396,9 +1388,10 @@
 matches any string formed by concatenating two substrings
 that respectively match the concatenated expressions.
 </p>
-<p>Two regular expressions may be joined by the infix operator 
&lsquo;<samp>|</samp>&rsquo;;
-the resulting regular expression
-matches any string matching either alternate expression.
+<span id="index-alternatives-in-regular-expressions"></span>
+<p>Two regular expressions may be joined by the infix operator 
&lsquo;<samp>|</samp>&rsquo;.
+The resulting regular expression matches any string matching either of
+the two expressions, which are called <em>alternatives</em>.
 </p>
 <p>Repetition takes precedence over concatenation,
 which in turn takes precedence over alternation.
@@ -1406,12 +1399,15 @@
 to override these precedence rules and form a subexpression.
 An unmatched &lsquo;<samp>)</samp>&rsquo; matches just itself.
 </p>
+<p>Not every character string is a valid regular expression.
+See <a href="#Problematic-Expressions">Problematic Regular Expressions</a>.
+</p>
 <hr>
 </div>
 <div class="section" id="Character-Classes-and-Bracket-Expressions">
 <div class="header">
 <p>
-Next: <a href="#The-Backslash-Character-and-Special-Expressions" accesskey="n" 
rel="next">The Backslash Character and Special Expressions</a>, Previous: <a 
href="#Fundamental-Structure" accesskey="p" rel="prev">Fundamental 
Structure</a>, Up: <a href="#Regular-Expressions" accesskey="u" 
rel="up">Regular Expressions</a> &nbsp; [<a href="#SEC_Contents" title="Table 
of contents" rel="contents">Contents</a>][<a href="#Index" title="Index" 
rel="index">Index</a>]</p>
+Next: <a href="#Special-Backslash-Expressions" accesskey="n" 
rel="next">Special Backslash Expressions</a>, Previous: <a 
href="#Fundamental-Structure" accesskey="p" rel="prev">Fundamental 
Structure</a>, Up: <a href="#Regular-Expressions" accesskey="u" 
rel="up">Regular Expressions</a> &nbsp; [<a href="#SEC_Contents" title="Table 
of contents" rel="contents">Contents</a>][<a href="#Index" title="Index" 
rel="index">Index</a>]</p>
 </div>
 <span id="Character-Classes-and-Bracket-Expressions-1"></span><h3 
class="section">3.2 Character Classes and Bracket Expressions</h3>
 
@@ -1439,7 +1435,7 @@
 In other locales, the sorting sequence is not specified, and
 &lsquo;<samp>[a-d]</samp>&rsquo; might be equivalent to 
&lsquo;<samp>[abcd]</samp>&rsquo; or to
 &lsquo;<samp>[aBbCcDd]</samp>&rsquo;, or it might fail to match any character, 
or the set of
-characters that it matches might even be erratic.
+characters that it matches might be erratic, or it might be invalid.
 To obtain the traditional interpretation
 of bracket expressions, you can use the &lsquo;<samp>C</samp>&rsquo; locale by 
setting the
 <code>LC_ALL</code> environment variable to the value 
&lsquo;<samp>C</samp>&rsquo;.
@@ -1541,11 +1537,10 @@
 part of the symbolic names, and must be included in addition to
 the brackets delimiting the bracket expression.
 </p>
-<span id="invalid_002dbracket_002dexpr"></span><p>If you mistakenly omit the 
outer brackets, and search for say, &lsquo;<samp>[:upper:]</samp>&rsquo;,
+<p>If you mistakenly omit the outer brackets, and search for say, 
&lsquo;<samp>[:upper:]</samp>&rsquo;,
 GNU <code>grep</code> prints a diagnostic and exits with status 2, on
-the assumption that you did not intend to search for the nominally
-equivalent regular expression: &lsquo;<samp>[:epru]</samp>&rsquo;.
-Set the <code>POSIXLY_CORRECT</code> environment variable to disable this 
feature.
+the assumption that you did not intend to search for the
+regular expression &lsquo;<samp>[:epru]</samp>&rsquo;.
 </p>
 <p>Special characters lose their special meaning inside bracket expressions.
 </p>
@@ -1583,7 +1578,7 @@
 </dd>
 <dt><span>&lsquo;<samp>-</samp>&rsquo;</span></dt>
 <dd><p>represents the range if it&rsquo;s not first or last in a list or the 
ending point
-of a range.
+of a range.  To make the &lsquo;<samp>-</samp>&rsquo; a list item, it is best 
to put it last.
 </p>
 </dd>
 <dt><span>&lsquo;<samp>^</samp>&rsquo;</span></dt>
@@ -1596,12 +1591,12 @@
 
 <hr>
 </div>
-<div class="section" id="The-Backslash-Character-and-Special-Expressions">
+<div class="section" id="Special-Backslash-Expressions">
 <div class="header">
 <p>
 Next: <a href="#Anchoring" accesskey="n" rel="next">Anchoring</a>, Previous: 
<a href="#Character-Classes-and-Bracket-Expressions" accesskey="p" 
rel="prev">Character Classes and Bracket Expressions</a>, Up: <a 
href="#Regular-Expressions" accesskey="u" rel="up">Regular Expressions</a> 
&nbsp; [<a href="#SEC_Contents" title="Table of contents" 
rel="contents">Contents</a>][<a href="#Index" title="Index" 
rel="index">Index</a>]</p>
 </div>
-<span id="The-Backslash-Character-and-Special-Expressions-1"></span><h3 
class="section">3.3 The Backslash Character and Special Expressions</h3>
+<span id="Special-Backslash-Expressions-1"></span><h3 class="section">3.3 
Special Backslash Expressions</h3>
 <span id="index-backslash"></span>
 
 <p>The &lsquo;<samp>\</samp>&rsquo; character followed by a special character 
is a regular
@@ -1643,17 +1638,32 @@
 <dd><p>Match non-whitespace, it is a synonym for 
&lsquo;<samp>[^[:space:]]</samp>&rsquo;.
 </p>
 </dd>
+<dt><span>&lsquo;<samp>\]</samp>&rsquo;</span></dt>
+<dd><p>Match &lsquo;<samp>]</samp>&rsquo;.
+</p>
+</dd>
+<dt><span>&lsquo;<samp>\}</samp>&rsquo;</span></dt>
+<dd><p>Match &lsquo;<samp>}</samp>&rsquo;.
+</p>
+</dd>
 </dl>
 
 <p>For example, &lsquo;<samp>\brat\b</samp>&rsquo; matches the separate word 
&lsquo;<samp>rat</samp>&rsquo;,
 &lsquo;<samp>\Brat\B</samp>&rsquo; matches &lsquo;<samp>crate</samp>&rsquo; 
but not &lsquo;<samp>furry rat</samp>&rsquo;.
 </p>
+<p>The behavior of <code>grep</code> is unspecified if a unescaped backslash
+is not followed by a special character, a nonzero digit, or a
+character in the above list.  Although <code>grep</code> might issue a
+diagnostic and/or give the backslash an interpretation now, its
+behavior may change if the syntax of regular expressions is extended
+in future versions.
+</p>
 <hr>
 </div>
 <div class="section" id="Anchoring">
 <div class="header">
 <p>
-Next: <a href="#Back_002dreferences-and-Subexpressions" accesskey="n" 
rel="next">Back-references and Subexpressions</a>, Previous: <a 
href="#The-Backslash-Character-and-Special-Expressions" accesskey="p" 
rel="prev">The Backslash Character and Special Expressions</a>, Up: <a 
href="#Regular-Expressions" accesskey="u" rel="up">Regular Expressions</a> 
&nbsp; [<a href="#SEC_Contents" title="Table of contents" 
rel="contents">Contents</a>][<a href="#Index" title="Index" 
rel="index">Index</a>]</p>
+Next: <a href="#Back_002dreferences-and-Subexpressions" accesskey="n" 
rel="next">Back-references and Subexpressions</a>, Previous: <a 
href="#Special-Backslash-Expressions" accesskey="p" rel="prev">Special 
Backslash Expressions</a>, Up: <a href="#Regular-Expressions" accesskey="u" 
rel="up">Regular Expressions</a> &nbsp; [<a href="#SEC_Contents" title="Table 
of contents" rel="contents">Contents</a>][<a href="#Index" title="Index" 
rel="index">Index</a>]</p>
 </div>
 <span id="Anchoring-1"></span><h3 class="section">3.4 Anchoring</h3>
 <span id="index-anchoring"></span>
@@ -1696,58 +1706,175 @@
 <div class="section" id="Basic-vs-Extended">
 <div class="header">
 <p>
-Next: <a href="#Character-Encoding" accesskey="n" rel="next">Character 
Encoding</a>, Previous: <a href="#Back_002dreferences-and-Subexpressions" 
accesskey="p" rel="prev">Back-references and Subexpressions</a>, Up: <a 
href="#Regular-Expressions" accesskey="u" rel="up">Regular Expressions</a> 
&nbsp; [<a href="#SEC_Contents" title="Table of contents" 
rel="contents">Contents</a>][<a href="#Index" title="Index" 
rel="index">Index</a>]</p>
+Next: <a href="#Problematic-Expressions" accesskey="n" rel="next">Problematic 
Regular Expressions</a>, Previous: <a 
href="#Back_002dreferences-and-Subexpressions" accesskey="p" 
rel="prev">Back-references and Subexpressions</a>, Up: <a 
href="#Regular-Expressions" accesskey="u" rel="up">Regular Expressions</a> 
&nbsp; [<a href="#SEC_Contents" title="Table of contents" 
rel="contents">Contents</a>][<a href="#Index" title="Index" 
rel="index">Index</a>]</p>
 </div>
 <span id="Basic-vs-Extended-Regular-Expressions"></span><h3 
class="section">3.6 Basic vs Extended Regular Expressions</h3>
 <span id="index-basic-regular-expressions"></span>
 
-<p>In basic regular expressions the characters &lsquo;<samp>?</samp>&rsquo;, 
&lsquo;<samp>+</samp>&rsquo;,
+<p>Basic regular expressions differ from extended regular expressions
+in the following ways:
+</p>
+<ul>
+<li> The characters &lsquo;<samp>?</samp>&rsquo;, &lsquo;<samp>+</samp>&rsquo;,
 &lsquo;<samp>{</samp>&rsquo;, &lsquo;<samp>|</samp>&rsquo;, 
&lsquo;<samp>(</samp>&rsquo;, and &lsquo;<samp>)</samp>&rsquo; lose their 
special meaning;
 instead use the backslashed versions &lsquo;<samp>\?</samp>&rsquo;, 
&lsquo;<samp>\+</samp>&rsquo;, &lsquo;<samp>\{</samp>&rsquo;,
 &lsquo;<samp>\|</samp>&rsquo;, &lsquo;<samp>\(</samp>&rsquo;, and 
&lsquo;<samp>\)</samp>&rsquo;.  Also, a backslash is needed
-before an interval expression&rsquo;s closing &lsquo;<samp>}</samp>&rsquo;, 
and an unmatched
-<code>\)</code> is invalid.
+before an interval expression&rsquo;s closing &lsquo;<samp>}</samp>&rsquo;.
+
+</li><li> An unmatched &lsquo;<samp>\)</samp>&rsquo; is invalid.
+
+</li><li> If an unescaped &lsquo;<samp>^</samp>&rsquo; appears neither first, 
nor directly after
+&lsquo;<samp>\(</samp>&rsquo; or &lsquo;<samp>\|</samp>&rsquo;, it is treated 
like an ordinary character and
+is not an anchor.
+
+</li><li> If an unescaped &lsquo;<samp>$</samp>&rsquo; appears neither last, 
nor directly before
+&lsquo;<samp>\|</samp>&rsquo; or &lsquo;<samp>\)</samp>&rsquo;, it is treated 
like an ordinary character and
+is not an anchor.
+
+</li><li> If an unescaped &lsquo;<samp>*</samp>&rsquo; appears first, or 
appears directly after
+&lsquo;<samp>\(</samp>&rsquo; or &lsquo;<samp>\|</samp>&rsquo; or anchoring 
&lsquo;<samp>^</samp>&rsquo;, it is treated like an
+ordinary character and is not a repetition operator.
+</li></ul>
+
+<hr>
+</div>
+<div class="section" id="Problematic-Expressions">
+<div class="header">
+<p>
+Next: <a href="#Character-Encoding" accesskey="n" rel="next">Character 
Encoding</a>, Previous: <a href="#Basic-vs-Extended" accesskey="p" 
rel="prev">Basic vs Extended Regular Expressions</a>, Up: <a 
href="#Regular-Expressions" accesskey="u" rel="up">Regular Expressions</a> 
&nbsp; [<a href="#SEC_Contents" title="Table of contents" 
rel="contents">Contents</a>][<a href="#Index" title="Index" 
rel="index">Index</a>]</p>
+</div>
+<span id="Problematic-Regular-Expressions"></span><h3 class="section">3.7 
Problematic Regular Expressions</h3>
+
+<span id="index-invalid-regular-expressions"></span>
+<span id="index-unspecified-behavior-in-regular-expressions"></span>
+<p>Some strings are <em>invalid regular expressions</em> and cause
+<code>grep</code> to issue a diagnostic and fail.  For example, 
&lsquo;<samp>xy\1</samp>&rsquo;
+is invalid because there is no parenthesized subexpression for the
+back-reference &lsquo;<samp>\1</samp>&rsquo; to refer to.
+</p>
+<p>Also, some regular expressions have <em>unspecified behavior</em> and
+should be avoided even if <code>grep</code> does not currently diagnose
+them.  For example, &lsquo;<samp>xy\0</samp>&rsquo; has unspecified behavior 
because
+&lsquo;<samp>0</samp>&rsquo; is not a special character and 
&lsquo;<samp>\0</samp>&rsquo; is not a special
+backslash expression (see <a href="#Special-Backslash-Expressions">Special 
Backslash Expressions</a>).
+Unspecified behavior can be particularly problematic because the set
+of matched strings might be only partially specified, or not be
+specified at all, or the expression might even be invalid.
+</p>
+<p>The following regular expression constructs are invalid on all
+platforms conforming to POSIX, so portable scripts can assume that
+<code>grep</code> rejects these constructs:
 </p>
-<p>Portable scripts should avoid the following constructs, as
-POSIX says they produce undefined results:
+<ul>
+<li> A basic regular expression containing a back-reference 
&lsquo;<samp>\<var>n</var></samp>&rsquo;
+preceded by fewer than <var>n</var> closing parentheses.  For example,
+&lsquo;<samp>\(a\)\2</samp>&rsquo; is invalid.
+
+</li><li> A bracket expression containing &lsquo;<samp>[:</samp>&rsquo; that 
does not start a
+character class; and similarly for &lsquo;<samp>[=</samp>&rsquo; and 
&lsquo;<samp>[.</samp>&rsquo;.  For
+example, &lsquo;<samp>[a[:b]</samp>&rsquo; and 
&lsquo;<samp>[a[:ouch:]b]</samp>&rsquo; are invalid.
+</li></ul>
+
+<p>GNU <code>grep</code> treats the following constructs as invalid.
+However, other <code>grep</code> implementations might allow them, so
+portable scripts should not rely on their being invalid:
 </p>
 <ul>
-<li> Extended regular expressions that use back-references.
-</li><li> Basic regular expressions that use &lsquo;<samp>\?</samp>&rsquo;, 
&lsquo;<samp>\+</samp>&rsquo;, or &lsquo;<samp>\|</samp>&rsquo;.
-</li><li> Empty parenthesized regular expressions like 
&lsquo;<samp>()</samp>&rsquo;.
-</li><li> Empty alternatives (as in, e.g, &lsquo;<samp>a|</samp>&rsquo;).
-</li><li> Repetition operators that immediately follow empty expressions,
-unescaped &lsquo;<samp>$</samp>&rsquo;, or other repetition operators.
-</li><li> A backslash escaping an ordinary character (e.g., 
&lsquo;<samp>\S</samp>&rsquo;),
-unless it is a back-reference.
-</li><li> An unescaped &lsquo;<samp>[</samp>&rsquo; that is not part of a 
bracket expression.
-</li><li> In extended regular expressions, an unescaped 
&lsquo;<samp>{</samp>&rsquo; that is not
-part of an interval expression.
+<li> Unescaped &lsquo;<samp>\</samp>&rsquo; at the end of a regular expression.
+
+</li><li> Unescaped &lsquo;<samp>[</samp>&rsquo; that does not start a bracket 
expression.
+
+</li><li> A &lsquo;<samp>\{</samp>&rsquo; in a basic regular expression that 
does not start an
+interval expression.
+
+</li><li> A basic regular expression with unbalanced 
&lsquo;<samp>\(</samp>&rsquo; or &lsquo;<samp>\)</samp>&rsquo;,
+or an extended regular expression with unbalanced &lsquo;<samp>(</samp>&rsquo;.
+
+</li><li> In the POSIX locale, a range expression like 
&lsquo;<samp>z-a</samp>&rsquo; that
+represents zero elements.  A non-GNU <code>grep</code> might treat it as
+a valid range that never matches.
+
+</li><li> An interval expression with a repetition count greater than 32767.
+(The portable POSIX limit is 255, and even interval expressions with
+smaller counts can be impractically slow on all known implementations.)
+
+</li><li> A bracket expression that contains at least three elements, the first
+and last of which are both &lsquo;<samp>:</samp>&rsquo;, or both 
&lsquo;<samp>.</samp>&rsquo;, or both
+&lsquo;<samp>=</samp>&rsquo;.  For example, a non-GNU <code>grep</code> might 
treat
+&lsquo;<samp>[:alpha:]</samp>&rsquo; like 
&lsquo;<samp>[[:alpha:]]</samp>&rsquo;, or like 
&lsquo;<samp>[:ahlp]</samp>&rsquo;.
 </li></ul>
 
-<span id="index-interval-expressions-1"></span>
-<p>Traditional <code>egrep</code> did not support interval expressions and
-some <code>egrep</code> implementations use &lsquo;<samp>\{</samp>&rsquo; and 
&lsquo;<samp>\}</samp>&rsquo; instead, so
-portable scripts should avoid interval expressions in 
&lsquo;<samp>grep&nbsp;-E</samp>&rsquo; patterns
-and should use &lsquo;<samp>[{]</samp>&rsquo; to match a literal 
&lsquo;<samp>{</samp>&rsquo;.
-</p>
-<p>GNU <code>grep&nbsp;-E</code> attempts to support traditional usage by
-assuming that &lsquo;<samp>{</samp>&rsquo; is not special if it would be the 
start of an
-invalid interval expression.
-For example, the command
-&lsquo;<samp>grep&nbsp;-E&nbsp;'{1'</samp>&rsquo; searches for the 
two-character string &lsquo;<samp>{1</samp>&rsquo;
-instead of reporting a syntax error in the regular expression.
-POSIX allows this behavior as an extension, but portable scripts
-should avoid it.
+<p>The following constructs have well-defined behavior in GNU
+<code>grep</code>.  However, they have unspecified behavior elsewhere, so
+portable scripts should avoid them:
+</p>
+<ul>
+<li> Special backslash expressions like &lsquo;<samp>\b</samp>&rsquo;, 
&lsquo;<samp>\&lt;</samp>&rsquo;, and &lsquo;<samp>\]</samp>&rsquo;.
+See <a href="#Special-Backslash-Expressions">Special Backslash Expressions</a>.
+
+</li><li> A basic regular expression that uses &lsquo;<samp>\?</samp>&rsquo;, 
&lsquo;<samp>\+</samp>&rsquo;, or &lsquo;<samp>\|</samp>&rsquo;.
+
+</li><li> An extended regular expression that uses back-references.
+
+</li><li> An empty regular expression, subexpression, or alternative.  For
+example, &lsquo;<samp>(a|bc|)</samp>&rsquo; is not portable; a portable 
equivalent is
+&lsquo;<samp>(a|bc)?</samp>&rsquo;.
+
+</li><li> In a basic regular expression, an anchoring 
&lsquo;<samp>^</samp>&rsquo; that appears
+directly after &lsquo;<samp>\(</samp>&rsquo;, or an anchoring 
&lsquo;<samp>$</samp>&rsquo; that appears
+directly before &lsquo;<samp>\)</samp>&rsquo;.
+
+</li><li> In a basic regular expression, a repetition operator that
+directly follows another repetition operator.
+
+</li><li> In an extended regular expression, unescaped 
&lsquo;<samp>{</samp>&rsquo;
+that does not begin a valid interval expression.
+GNU <code>grep</code> treats the &lsquo;<samp>{</samp>&rsquo; as an ordinary 
character.
+
+</li><li> A null character or an encoding error in either pattern or input 
data.
+See <a href="#Character-Encoding">Character Encoding</a>.
+
+</li><li> An input file that ends in a non-newline character,
+where GNU <code>grep</code> silently supplies a newline.
+</li></ul>
+
+<p>The following constructs have unspecified behavior, in both GNU
+and other <code>grep</code> implementations.  Scripts should avoid
+them whenever possible.
 </p>
+<ul>
+<li> A backslash escaping an ordinary character, unless it is a
+back-reference like &lsquo;<samp>\1</samp>&rsquo; or a special backslash 
expression like
+&lsquo;<samp>\&lt;</samp>&rsquo; or &lsquo;<samp>\b</samp>&rsquo;.  See <a 
href="#Special-Backslash-Expressions">Special Backslash Expressions</a>.  For
+example, &lsquo;<samp>\x</samp>&rsquo; has unspecified behavior now, and a 
future version
+of <code>grep</code> might specify &lsquo;<samp>\x</samp>&rsquo; to have a new 
behavior.
+
+</li><li> A repetition operator that appears directly after an anchor, or at 
the
+start of a complete regular expression, parenthesized subexpression,
+or alternative.  For example, &lsquo;<samp>+|^*(+a|?-b)</samp>&rsquo; has 
unspecified
+behavior, whereas &lsquo;<samp>\+|^\*(\+a|\?-b)</samp>&rsquo; is portable.
+
+</li><li> A range expression outside the POSIX locale.  For example, in some
+locales &lsquo;<samp>[a-z]</samp>&rsquo; might match some characters that are 
not
+lowercase letters, or might not match some lowercase letters, or might
+be invalid.  With GNU <code>grep</code> it is not documented whether
+these range expressions use native code points, or use the collating
+sequence specified by the <code>LC_COLLATE</code> category, or have some
+other interpretation.  Outside the POSIX locale, it is portable to use
+&lsquo;<samp>[[:lower:]]</samp>&rsquo; to match a lower-case letter, or
+&lsquo;<samp>[abcdefghijklmnopqrstuvwxyz]</samp>&rsquo; to match an ASCII 
lower-case
+letter.
+
+</li></ul>
+
 <hr>
 </div>
 <div class="section" id="Character-Encoding">
 <div class="header">
 <p>
-Next: <a href="#Matching-Non_002dASCII" accesskey="n" rel="next">Matching 
Non-ASCII and Non-printable Characters</a>, Previous: <a 
href="#Basic-vs-Extended" accesskey="p" rel="prev">Basic vs Extended Regular 
Expressions</a>, Up: <a href="#Regular-Expressions" accesskey="u" 
rel="up">Regular Expressions</a> &nbsp; [<a href="#SEC_Contents" title="Table 
of contents" rel="contents">Contents</a>][<a href="#Index" title="Index" 
rel="index">Index</a>]</p>
+Next: <a href="#Matching-Non_002dASCII" accesskey="n" rel="next">Matching 
Non-ASCII and Non-printable Characters</a>, Previous: <a 
href="#Problematic-Expressions" accesskey="p" rel="prev">Problematic Regular 
Expressions</a>, Up: <a href="#Regular-Expressions" accesskey="u" 
rel="up">Regular Expressions</a> &nbsp; [<a href="#SEC_Contents" title="Table 
of contents" rel="contents">Contents</a>][<a href="#Index" title="Index" 
rel="index">Index</a>]</p>
 </div>
-<span id="Character-Encoding-1"></span><h3 class="section">3.7 Character 
Encoding</h3>
+<span id="Character-Encoding-1"></span><h3 class="section">3.8 Character 
Encoding</h3>
 <span id="index-character-encoding"></span>
 
 <p>The <code>LC_CTYPE</code> locale specifies the encoding of characters in
@@ -1780,7 +1907,7 @@
 <p>
 Previous: <a href="#Character-Encoding" accesskey="p" rel="prev">Character 
Encoding</a>, Up: <a href="#Regular-Expressions" accesskey="u" rel="up">Regular 
Expressions</a> &nbsp; [<a href="#SEC_Contents" title="Table of contents" 
rel="contents">Contents</a>][<a href="#Index" title="Index" 
rel="index">Index</a>]</p>
 </div>
-<span id="Matching-Non_002dASCII-and-Non_002dprintable-Characters"></span><h3 
class="section">3.8 Matching Non-ASCII and Non-printable Characters</h3>
+<span id="Matching-Non_002dASCII-and-Non_002dprintable-Characters"></span><h3 
class="section">3.9 Matching Non-ASCII and Non-printable Characters</h3>
 <span id="index-non_002dASCII-matching"></span>
 <span id="index-non_002dprintable-matching"></span>
 
@@ -1909,24 +2036,36 @@
 </pre></div>
 
 </li><li> What if a pattern or file has a leading &lsquo;<samp>-</samp>&rsquo;?
+For example:
 
 <div class="example">
-<pre class="example">grep -- '--cut here--' *
+<pre class="example">grep &quot;$pattern&quot; *
 </pre></div>
 
-<p>searches for all lines matching &lsquo;<samp>--cut here--</samp>&rsquo;.
-Without <samp>--</samp>,
-<code>grep</code> would attempt to parse &lsquo;<samp>--cut 
here--</samp>&rsquo; as a list of
-options, and there would be similar problems with any file names
-beginning with &lsquo;<samp>-</samp>&rsquo;.
+<p>can behave unexpectedly if the value of &lsquo;<samp>pattern</samp>&rsquo; 
begins with &lsquo;<samp>-</samp>&rsquo;,
+or if the &lsquo;<samp>*</samp>&rsquo; expands to a file name with leading 
&lsquo;<samp>-</samp>&rsquo;.
+To avoid the problem, you can use <samp>-e</samp> for patterns and leading
+&lsquo;<samp>./</samp>&rsquo; for files:
 </p>
-<p>Alternatively, you can prevent misinterpretation of leading 
&lsquo;<samp>-</samp>&rsquo;
-by using <samp>-e</samp> for patterns and leading 
&lsquo;<samp>./</samp>&rsquo; for files:
+<div class="example">
+<pre class="example">grep -e &quot;$pattern&quot; ./*
+</pre></div>
+
+<p>searches for all lines matching the pattern in all the working
+directory&rsquo;s files whose names do not begin with 
&lsquo;<samp>.</samp>&rsquo;.
+Without the <samp>-e</samp>, <code>grep</code> might treat the pattern as an
+option if it begins with &lsquo;<samp>-</samp>&rsquo;.  Without the 
&lsquo;<samp>./</samp>&rsquo;, there might
+be similar problems with file names beginning with 
&lsquo;<samp>-</samp>&rsquo;.
+</p>
+<p>Alternatively, you can use &lsquo;<samp>--</samp>&rsquo; before the pattern 
and file names:
 </p>
 <div class="example">
-<pre class="example">grep -e '--cut here--' ./*
+<pre class="example">grep -- &quot;$pattern&quot; *
 </pre></div>
 
+<p>This also fixes the problem, except that if there is a file named 
&lsquo;<samp>-</samp>&rsquo;,
+<code>grep</code> misinterprets the &lsquo;<samp>-</samp>&rsquo; as standard 
input.
+</p>
 </li><li> Suppose I want to search for a whole word, not a part of a word?
 
 <div class="example">
@@ -2000,8 +2139,7 @@
 <samp>-a</samp> or &lsquo;<samp>--binary-files=text</samp>&rsquo; option.
 To eliminate the
 &ldquo;Binary file matches&rdquo; messages, use the <samp>-I</samp> or
-&lsquo;<samp>--binary-files=without-match</samp>&rsquo; option,
-or the <samp>-s</samp> or <samp>--no-messages</samp> option.
+&lsquo;<samp>--binary-files=without-match</samp>&rsquo; option.
 </p>
 </li><li> Why doesn&rsquo;t &lsquo;<samp>grep -lv</samp>&rsquo; print 
non-matching file names?
 
@@ -2029,7 +2167,10 @@
 </p>
 <p>To match empty lines, use the pattern &lsquo;<samp>^$</samp>&rsquo;.  To 
match blank
 lines, use the pattern &lsquo;<samp>^[[:blank:]]*$</samp>&rsquo;.  To match no 
lines at
-all, use the command &lsquo;<samp>grep -f /dev/null</samp>&rsquo;.
+all, use an extended regular expression like &lsquo;<samp>a^</samp>&rsquo; or 
&lsquo;<samp>$a</samp>&rsquo;.
+To match every line, a portable script should use a pattern like
+&lsquo;<samp>^</samp>&rsquo; instead of the empty pattern, as POSIX does not 
specify the
+behavior of the empty pattern.
 </p>
 </li><li> How can I search in both standard input and in files?
 
@@ -2039,6 +2180,21 @@
 <pre class="example">cat /etc/passwd | grep 'alain' - /etc/motd
 </pre></div>
 
+</li><li> Why can&rsquo;t I combine the shell&rsquo;s &lsquo;<samp>set 
-e</samp>&rsquo; with <code>grep</code>?
+
+<p>The <code>grep</code> command follows the convention of programs like
+<code>cmp</code> and <code>diff</code> where an exit status of 1 is not an
+error.  The shell command &lsquo;<samp>set -e</samp>&rsquo; causes the shell 
to exit if
+any subcommand exits with nonzero status, and this will cause the
+shell to exit merely because <code>grep</code> selected no lines,
+which is ordinarily not what you want.
+</p>
+<p>There is a related problem with Bash&rsquo;s <code>set -e -o 
pipefail</code>.
+Since <code>grep</code> does not always read all its input, a command
+outputting to a pipe read by <code>grep</code> can fail when
+<code>grep</code> exits before reading all its input, and the command&rsquo;s
+failure can cause Bash to exit.
+</p>
 </li><li> Why is this back-reference failing?
 
 <div class="example">
@@ -2069,7 +2225,7 @@
 <code>sed</code>, <code>perl</code>, or many other utilities that are
 designed to operate across lines.
 </p>
-</li><li> What do <code>grep</code>, <code>fgrep</code>, and 
<code>egrep</code> stand for?
+</li><li> What do <code>grep</code>, <samp>-E</samp>, and <samp>-F</samp> 
stand for?
 
 <p>The name <code>grep</code> comes from the way line editing was done on Unix.
 For example,
@@ -2081,9 +2237,29 @@
 g/re/p
 </pre></div>
 
-<p><code>fgrep</code> stands for Fixed <code>grep</code>;
-<code>egrep</code> stands for Extended <code>grep</code>.
+<p>The <samp>-E</samp> option stands for Extended <code>grep</code>.
+The <samp>-F</samp> option stands for Fixed <code>grep</code>;
+</p>
+</li><li> What happened to <code>egrep</code> and <code>fgrep</code>?
+
+<p>7th Edition Unix had commands <code>egrep</code> and <code>fgrep</code>
+that were the counterparts of the modern &lsquo;<samp>grep -E</samp>&rsquo; 
and &lsquo;<samp>grep -F</samp>&rsquo;.
+Although breaking up <code>grep</code> into three programs was perhaps
+useful on the small computers of the 1970s, <code>egrep</code> and
+<code>fgrep</code> were not standardized by POSIX and are no longer needed.
+In the current GNU implementation, <code>egrep</code> and <code>fgrep</code>
+issue a warning and then act like their modern counterparts;
+eventually, they are planned to be removed entirely.
+</p>
+<p>If you prefer the old names, you can use use your own substitutes,
+such as a shell script named <code>egrep</code> with the following
+contents:
 </p>
+<div class="example">
+<pre class="example">#!/bin/sh
+exec grep -E &quot;$@&quot;
+</pre></div>
+
 </li></ol>
 
 
@@ -2125,6 +2301,17 @@
 surprisingly inefficient due to difficulties in fast portable access to
 concepts like multi-character collating elements.
 </p>
+<span id="index-interval-expressions-1"></span>
+<p>Interval expressions may be implemented internally via repetition.
+For example, &lsquo;<samp>^(a|bc){2,4}$</samp>&rsquo; might be implemented as
+&lsquo;<samp>^(a|bc)(a|bc)((a|bc)(a|bc)?)?$</samp>&rsquo;.  A large repetition 
count may
+exhaust memory or greatly slow matching.  Even small counts can cause
+problems if cascaded; for example, &lsquo;<samp>grep -E
+&quot;.*{10,}{10,}{10,}{10,}{10,}&quot;</samp>&rsquo; is likely to overflow a
+stack.  Fortunately, regular expressions like these are typically
+artificial, and cascaded repetitions do not conform to POSIX so cannot
+be used in portable programs anyway.
+</p>
 <span id="index-back_002dreferences"></span>
 <p>A back-reference such as &lsquo;<samp>\1</samp>&rsquo; can hurt performance 
significantly
 in some cases, since back-references cannot in general be implemented
@@ -2145,6 +2332,14 @@
 <samp>-a</samp> (<samp>--binary-files=text</samp>) option is used (see <a 
href="#File-and-Directory-Selection">File and Directory Selection</a>), unless 
the <samp>-z</samp> (<samp>--null-data</samp>)
 option is also used (see <a href="#Other-Options">Other Options</a>).
 </p>
+<span id="index-pipelines-and-reading"></span>
+<p>For efficiency <code>grep</code> does not always read all its input.
+For example, the shell command &lsquo;<samp>sed '/^...$/d' | grep -q 
X</samp>&rsquo; can
+cause <code>grep</code> to exit immediately after reading a line
+containing &lsquo;<samp>X</samp>&rsquo;, without bothering to read the rest of 
its input data.
+This in turn can cause <code>sed</code> to exit with a nonzero status because
+<code>sed</code> cannot write to its output pipe after <code>grep</code> exits.
+</p>
 <p>For more about the algorithms used by <code>grep</code> and about
 related string matching algorithms, see:
 </p>
@@ -2157,20 +2352,33 @@
 
 </li><li> Aho AV, Corasick MJ. Efficient string matching: an aid to 
bibliographic search.
 <em>CACM</em>. 1975;18(6):333&ndash;40.
-<a 
href="https://dx.doi.org/10.1145/360825.360855";>https://dx.doi.org/10.1145/360825.360855</a>.
+<a 
href="https://doi.org/10.1145/360825.360855";>https://doi.org/10.1145/360825.360855</a>.
 This introduces the Aho&ndash;Corasick algorithm.
 
 </li><li> Boyer RS, Moore JS. A fast string searching algorithm.
 <em>CACM</em>. 1977;20(10):762&ndash;72.
-<a 
href="https://dx.doi.org/10.1145/359842.359859";>https://dx.doi.org/10.1145/359842.359859</a>.
+<a 
href="https://doi.org/10.1145/359842.359859";>https://doi.org/10.1145/359842.359859</a>.
 This introduces the Boyer&ndash;Moore algorithm.
 
 </li><li> Faro S, Lecroq T. The exact online string matching problem: a review
 of the most recent results.
 <em>ACM Comput Surv</em>. 2013;45(2):13.
-<a 
href="https://dx.doi.org/10.1145/2431211.2431212";>https://dx.doi.org/10.1145/2431211.2431212</a>.
+<a 
href="https://doi.org/10.1145/2431211.2431212";>https://doi.org/10.1145/2431211.2431212</a>.
 This surveys string matching algorithms that might help improve the
 performance of <code>grep</code> in the future.
+
+</li><li> Hakak SI, Kamsin A, Shivakumara P, Gilkar GA, Khan WZ, Imran M.
+Exact string matching algorithms: survey issues, and future research 
directions.
+<em>IEEE Access</em>. 2019;7:69614&ndash;37.
+<a 
href="https://doi.org/10.1109/ACCESS.2019.2914071";>https://doi.org/10.1109/ACCESS.2019.2914071</a>.
+This survey is more recent than Faro &amp; Lecroq,
+and focuses on taxonomy instead of performance.
+
+</li><li> Hume A, Sunday D. Fast string search.
+<em>Software Pract Exper</em>. 1991;21(11):1221&ndash;48.
+<a 
href="https://doi.org/10.1002/spe.4380211105";>https://doi.org/10.1002/spe.4380211105</a>.
+This excellent albeit now-dated survey aided the initial development
+of <code>grep</code>.
 </li></ul>
 
 <hr>
@@ -2928,13 +3136,14 @@
 <tr><td></td><td valign="top"><a 
href="#index-alpha-character-class"><code>alpha <span class="roman">character 
class</span></code></a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Character-Classes-and-Bracket-Expressions">Character Classes and Bracket 
Expressions</a></td></tr>
 <tr><td></td><td valign="top"><a 
href="#index-alphabetic-characters">alphabetic 
characters</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Character-Classes-and-Bracket-Expressions">Character Classes and Bracket 
Expressions</a></td></tr>
 <tr><td></td><td valign="top"><a 
href="#index-alphanumeric-characters">alphanumeric 
characters</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Character-Classes-and-Bracket-Expressions">Character Classes and Bracket 
Expressions</a></td></tr>
+<tr><td></td><td valign="top"><a 
href="#index-alternatives-in-regular-expressions">alternatives in regular 
expressions</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Fundamental-Structure">Fundamental Structure</a></td></tr>
 <tr><td></td><td valign="top"><a 
href="#index-anchoring">anchoring</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Anchoring">Anchoring</a></td></tr>
 <tr><td></td><td valign="top"><a 
href="#index-asterisk">asterisk</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Fundamental-Structure">Fundamental Structure</a></td></tr>
 <tr><td colspan="4"> <hr></td></tr>
 <tr><th id="Index_cp_letter-B">B</th><td></td><td></td></tr>
 <tr><td></td><td valign="top"><a 
href="#index-back_002dreference">back-reference</a>:</td><td>&nbsp;</td><td 
valign="top"><a href="#Back_002dreferences-and-Subexpressions">Back-references 
and Subexpressions</a></td></tr>
 <tr><td></td><td valign="top"><a 
href="#index-back_002dreferences">back-references</a>:</td><td>&nbsp;</td><td 
valign="top"><a href="#Performance">Performance</a></td></tr>
-<tr><td></td><td valign="top"><a 
href="#index-backslash">backslash</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#The-Backslash-Character-and-Special-Expressions">The Backslash Character 
and Special Expressions</a></td></tr>
+<tr><td></td><td valign="top"><a 
href="#index-backslash">backslash</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Special-Backslash-Expressions">Special Backslash 
Expressions</a></td></tr>
 <tr><td></td><td valign="top"><a href="#index-basic-regular-expressions">basic 
regular expressions</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Basic-vs-Extended">Basic vs Extended</a></td></tr>
 <tr><td></td><td valign="top"><a href="#index-before-context">before 
context</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Context-Line-Control">Context Line Control</a></td></tr>
 <tr><td></td><td valign="top"><a href="#index-binary-files">binary 
files</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#File-and-Directory-Selection">File and Directory Selection</a></td></tr>
@@ -3012,7 +3221,8 @@
 <tr><th id="Index_cp_letter-I">I</th><td></td><td></td></tr>
 <tr><td></td><td valign="top"><a href="#index-include-files">include 
files</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#File-and-Directory-Selection">File and Directory Selection</a></td></tr>
 <tr><td></td><td valign="top"><a href="#index-interval-expressions">interval 
expressions</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Fundamental-Structure">Fundamental Structure</a></td></tr>
-<tr><td></td><td valign="top"><a href="#index-interval-expressions-1">interval 
expressions</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Basic-vs-Extended">Basic vs Extended</a></td></tr>
+<tr><td></td><td valign="top"><a href="#index-interval-expressions-1">interval 
expressions</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Performance">Performance</a></td></tr>
+<tr><td></td><td valign="top"><a 
href="#index-invalid-regular-expressions">invalid regular 
expressions</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Problematic-Expressions">Problematic Expressions</a></td></tr>
 <tr><td></td><td valign="top"><a href="#index-invert-matching">invert 
matching</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Matching-Control">Matching Control</a></td></tr>
 <tr><td colspan="4"> <hr></td></tr>
 <tr><th id="Index_cp_letter-L">L</th><td></td><td></td></tr>
@@ -3081,6 +3291,7 @@
 <tr><td></td><td valign="top"><a href="#index-patterns-option">patterns 
option</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Matching-Control">Matching Control</a></td></tr>
 <tr><td></td><td valign="top"><a 
href="#index-performance">performance</a>:</td><td>&nbsp;</td><td 
valign="top"><a href="#Performance">Performance</a></td></tr>
 <tr><td></td><td valign="top"><a 
href="#index-period">period</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Fundamental-Structure">Fundamental Structure</a></td></tr>
+<tr><td></td><td valign="top"><a href="#index-pipelines-and-reading">pipelines 
and reading</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Performance">Performance</a></td></tr>
 <tr><td></td><td valign="top"><a href="#index-plus-sign">plus 
sign</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Fundamental-Structure">Fundamental Structure</a></td></tr>
 <tr><td></td><td valign="top"><a 
href="#index-POSIXLY_005fCORRECT-environment-variable"><code>POSIXLY_CORRECT 
<span class="roman">environment 
variable</span></code></a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Environment-Variables">Environment Variables</a></td></tr>
 <tr><td></td><td valign="top"><a 
href="#index-print-character-class"><code>print <span class="roman">character 
class</span></code></a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Character-Classes-and-Bracket-Expressions">Character Classes and Bracket 
Expressions</a></td></tr>
@@ -3121,9 +3332,11 @@
 <tr><td colspan="4"> <hr></td></tr>
 <tr><th id="Index_cp_letter-T">T</th><td></td><td></td></tr>
 <tr><td></td><td valign="top"><a 
href="#index-tab_002daligned-content-lines">tab-aligned content 
lines</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Output-Line-Prefix-Control">Output Line Prefix Control</a></td></tr>
+<tr><td></td><td valign="top"><a 
href="#index-TERM-environment-variable"><code>TERM <span 
class="roman">environment variable</span></code></a>:</td><td>&nbsp;</td><td 
valign="top"><a href="#Environment-Variables">Environment 
Variables</a></td></tr>
 <tr><td></td><td valign="top"><a 
href="#index-translation-of-message-language">translation of message 
language</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Environment-Variables">Environment Variables</a></td></tr>
 <tr><td colspan="4"> <hr></td></tr>
 <tr><th id="Index_cp_letter-U">U</th><td></td><td></td></tr>
+<tr><td></td><td valign="top"><a 
href="#index-unspecified-behavior-in-regular-expressions">unspecified behavior 
in regular expressions</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Problematic-Expressions">Problematic Expressions</a></td></tr>
 <tr><td></td><td valign="top"><a 
href="#index-upper-character-class"><code>upper <span class="roman">character 
class</span></code></a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Character-Classes-and-Bracket-Expressions">Character Classes and Bracket 
Expressions</a></td></tr>
 <tr><td></td><td valign="top"><a 
href="#index-upper_002dcase-letters">upper-case 
letters</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Character-Classes-and-Bracket-Expressions">Character Classes and Bracket 
Expressions</a></td></tr>
 <tr><td></td><td valign="top"><a 
href="#index-usage-summary_002c-printing">usage summary, 
printing</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Generic-Program-Information">Generic Program Information</a></td></tr>
@@ -3212,14 +3425,6 @@
 
 </div>
 </div>
-<div class="footnote">
-<hr>
-<h4 class="footnotes-heading">Footnotes</h4>
-
-<h5><a id="FOOT1" href="#DOCF1">(1)</a></h5>
-<p>Of course, 7th Edition
-Unix predated POSIX by several years!</p>
-</div>
 
 
 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]