[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Changes to grep/manual/grep.html,v
From: |
Jim Meyering |
Subject: |
Changes to grep/manual/grep.html,v |
Date: |
Sat, 3 Sep 2022 15:33:15 -0400 (EDT) |
CVSROOT: /webcvs/grep
Module name: grep
Changes by: Jim Meyering <meyering> 22/09/03 15:33:15
Index: grep.html
===================================================================
RCS file: /webcvs/grep/grep/manual/grep.html,v
retrieving revision 1.32
retrieving revision 1.33
diff -u -b -r1.32 -r1.33
--- grep.html 14 Aug 2021 20:46:39 -0000 1.32
+++ grep.html 3 Sep 2022 19:33:13 -0000 1.33
@@ -5,7 +5,7 @@
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<!-- This manual is for grep, a pattern matching engine.
-Copyright (C) 1999-2002, 2005, 2008-2021 Free Software Foundation,
+Copyright (C) 1999-2002, 2005, 2008-2022 Free Software Foundation,
Inc.
Permission is granted to copy, distribute and/or modify this document
@@ -14,10 +14,10 @@
Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
Texts. A copy of the license is included in the section entitled
"GNU Free Documentation License". -->
-<title>GNU Grep 3.7</title>
+<title>GNU Grep 3.8</title>
-<meta name="description" content="GNU Grep 3.7">
-<meta name="keywords" content="GNU Grep 3.7">
+<meta name="description" content="GNU Grep 3.8">
+<meta name="keywords" content="GNU Grep 3.8">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
@@ -53,7 +53,7 @@
</head>
<body lang="en">
-<h1 class="settitle" align="center">GNU Grep 3.7</h1>
+<h1 class="settitle" align="center">GNU Grep 3.8</h1>
@@ -71,11 +71,11 @@
<p><code>grep</code> prints lines that contain a match for one or more
patterns.
</p>
-<p>This manual is for version 3.7 of GNU Grep.
+<p>This manual is for version 3.8 of GNU Grep.
</p>
<p>This manual is for <code>grep</code>, a pattern matching engine.
</p>
-<p>Copyright © 1999–2002, 2005, 2008–2021 Free Software
Foundation,
+<p>Copyright © 1999–2002, 2005, 2008–2022 Free Software
Foundation,
Inc.
</p>
<blockquote>
@@ -116,12 +116,13 @@
<ul class="no-bullet">
<li><a id="toc-Fundamental-Structure-1" href="#Fundamental-Structure">3.1
Fundamental Structure</a></li>
<li><a id="toc-Character-Classes-and-Bracket-Expressions-1"
href="#Character-Classes-and-Bracket-Expressions">3.2 Character Classes and
Bracket Expressions</a></li>
- <li><a id="toc-The-Backslash-Character-and-Special-Expressions-1"
href="#The-Backslash-Character-and-Special-Expressions">3.3 The Backslash
Character and Special Expressions</a></li>
+ <li><a id="toc-Special-Backslash-Expressions-1"
href="#Special-Backslash-Expressions">3.3 Special Backslash Expressions</a></li>
<li><a id="toc-Anchoring-1" href="#Anchoring">3.4 Anchoring</a></li>
<li><a id="toc-Back_002dreferences-and-Subexpressions-1"
href="#Back_002dreferences-and-Subexpressions">3.5 Back-references and
Subexpressions</a></li>
<li><a id="toc-Basic-vs-Extended-Regular-Expressions"
href="#Basic-vs-Extended">3.6 Basic vs Extended Regular Expressions</a></li>
- <li><a id="toc-Character-Encoding-1" href="#Character-Encoding">3.7
Character Encoding</a></li>
- <li><a id="toc-Matching-Non_002dASCII-and-Non_002dprintable-Characters"
href="#Matching-Non_002dASCII">3.8 Matching Non-ASCII and Non-printable
Characters</a></li>
+ <li><a id="toc-Problematic-Regular-Expressions"
href="#Problematic-Expressions">3.7 Problematic Regular Expressions</a></li>
+ <li><a id="toc-Character-Encoding-1" href="#Character-Encoding">3.8
Character Encoding</a></li>
+ <li><a id="toc-Matching-Non_002dASCII-and-Non_002dprintable-Characters"
href="#Matching-Non_002dASCII">3.9 Matching Non-ASCII and Non-printable
Characters</a></li>
</ul></li>
<li><a id="toc-Usage-1" href="#Usage">4 Usage</a></li>
<li><a id="toc-Performance-1" href="#Performance">5 Performance</a></li>
@@ -342,7 +343,7 @@
regular expression with ‘<samp>\<</samp>’ and
‘<samp>\></samp>’. For example, although
‘<samp>grep -w @</samp>’ matches a line containing only
‘<samp>@</samp>’, ‘<samp>grep
'\<@\>'</samp>’ cannot match any line because
‘<samp>@</samp>’ is not a
-word constituent. See <a
href="#The-Backslash-Character-and-Special-Expressions">The Backslash Character
and Special Expressions</a>.
+word constituent. See <a href="#Special-Backslash-Expressions">Special
Backslash Expressions</a>.
</p>
</dd>
<dt id='index-_002dx'><span><samp>-x</samp><a href='#index-_002dx'
class='copiable-anchor'> ¶</a></span></dt>
@@ -382,7 +383,7 @@
<dt><span><samp>--colour[=<var>WHEN</var>]</samp></span></dt>
<dd><span id="index-_002d_002dcolour"></span>
<span id="index-highlight_002c-color_002c-colour"></span>
-<p>Surround the matched (non-empty) strings, matching lines, context lines,
+<p>Surround matched non-empty strings, matching lines, context lines,
file names, line numbers, byte offsets, and separators (for fields and
groups of context lines) with escape sequences to display them in color
on the terminal.
@@ -390,11 +391,14 @@
and default to
‘<samp>ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36</samp>’
for bold red matched text, magenta file names, green line numbers,
green byte offsets, cyan separators, and default terminal colors otherwise.
-The deprecated environment variable <code>GREP_COLOR</code> is still supported,
-but its setting does not have priority;
-it defaults to ‘<samp>01;31</samp>’ (bold red)
-which only covers the color for matched text.
-<var>WHEN</var> is ‘<samp>never</samp>’,
‘<samp>always</samp>’, or ‘<samp>auto</samp>’.
+See <a href="#Environment-Variables">Environment Variables</a>.
+</p>
+<p><var>WHEN</var> is ‘<samp>always</samp>’ to use colors,
‘<samp>never</samp>’ to not use
+colors, or ‘<samp>auto</samp>’ to use colors if standard output is
associated
+with a terminal device and the <code>TERM</code> environment variable’s
value
+suggests that the terminal supports colors.
+Plain <samp>--color</samp> is treated like <samp>--color=auto</samp>;
+if no <samp>--color</samp> option is given, the default is
<samp>--color=never</samp>.
</p>
</dd>
<dt id='index-_002dL'><span><samp>-L</samp><a href='#index-_002dL'
class='copiable-anchor'> ¶</a></span></dt>
@@ -422,7 +426,11 @@
<dd><span id="index-_002d_002dmax_002dcount"></span>
<span id="index-max_002dcount"></span>
<p>Stop after the first <var>num</var> selected lines.
-If the input is standard input from a regular file,
+If <var>num</var> is zero, <code>grep</code> stops right away without reading
input.
+A <var>num</var> of -1 is treated as infinity and <code>grep</code>
+does not stop; this is the default.
+</p>
+<p>If the input is standard input from a regular file,
and <var>num</var> selected lines are output,
<code>grep</code> ensures that the standard input is positioned
just after the last selected line before exiting,
@@ -462,7 +470,7 @@
<dt><span><samp>--only-matching</samp></span></dt>
<dd><span id="index-_002d_002donly_002dmatching"></span>
<span id="index-only-matching"></span>
-<p>Print only the matched (non-empty) parts of matching lines,
+<p>Print only the matched non-empty parts of matching lines,
with each such part on a separate output line.
Output lines use the same delimiters as input, and delimiters are null
bytes if <samp>-z</samp> (<samp>--null-data</samp>) is also used (see <a
href="#Other-Options">Other Options</a>).
@@ -478,6 +486,9 @@
Exit immediately with zero status if any match is found,
even if an error was detected.
Also see the <samp>-s</samp> or <samp>--no-messages</samp> option.
+Portability note: Solaris 10 <code>grep</code> lacks <samp>-q</samp>;
+portable shell scripts typically can redirect standard output to
+<samp>/dev/null</samp> instead of using <samp>-q</samp>.
(<samp>-q</samp> is specified by POSIX.)
</p>
</dd>
@@ -486,17 +497,6 @@
<dd><span id="index-_002d_002dno_002dmessages"></span>
<span id="index-suppress-error-messages"></span>
<p>Suppress error messages about nonexistent or unreadable files.
-Portability note:
-unlike GNU <code>grep</code>,
-7th Edition Unix <code>grep</code> did not conform to POSIX,
-because it lacked <samp>-q</samp>
-and its <samp>-s</samp> option behaved like
-GNU <code>grep</code>’s <samp>-q</samp> option.<a id="DOCF1"
href="#FOOT1"><sup>1</sup></a>
-USG-style <code>grep</code> also lacked <samp>-q</samp>
-but its <samp>-s</samp> option behaved like GNU <code>grep</code>’s.
-Portable shell scripts should avoid both
-<samp>-q</samp> and <samp>-s</samp> and should redirect
-standard and error output to <samp>/dev/null</samp> instead.
(<samp>-s</samp> is specified by POSIX.)
</p>
</dd>
@@ -710,7 +710,7 @@
suppresses output after null input binary data is discovered,
and suppresses output lines that contain improperly encoded data.
When some output is suppressed, <code>grep</code> follows any output
-with a one-line message saying that a binary file matches.
+with a message to standard error saying that a binary file matches.
</p>
<p>If <var>type</var> is ‘<samp>without-match</samp>’,
when <code>grep</code> discovers null input binary data
@@ -922,8 +922,10 @@
</div>
<span id="Environment-Variables-1"></span><h3 class="section">2.2 Environment
Variables</h3>
-<p>The behavior of <code>grep</code> is affected
-by the following environment variables.
+<p>The behavior of <code>grep</code> is affected by several environment
+variables, the most important of which control the locale, which
+specifies how <code>grep</code> interprets characters in its patterns and
+data.
</p>
<span id="index-LANGUAGE-environment-variable"></span>
<span id="index-LC_005fALL-environment-variable"></span>
@@ -935,8 +937,8 @@
in that order.
The first of these variables that is set specifies the locale.
For example, if <code>LC_ALL</code> is not set,
-but <code>LC_COLLATE</code> is set to ‘<samp>pt_BR</samp>’,
-then the Brazilian Portuguese locale is used
+but <code>LC_COLLATE</code> is set to ‘<samp>pt_BR.UTF-8</samp>’,
+then a Brazilian Portuguese locale is used
for the <code>LC_COLLATE</code> category.
As a special case for <code>LC_MESSAGES</code> only, the environment variable
<code>LANGUAGE</code> can contain a colon-separated list of languages that
@@ -948,7 +950,32 @@
with national language support (NLS).
The shell command <code>locale -a</code> lists locales that are currently
available.
</p>
-<p>Many of the environment variables in the following list let you
+<span id="index-environment-variables"></span>
+<p>The following environment variables affect the behavior of
<code>grep</code>.
+</p>
+<dl compact="compact">
+<dt
id='index-GREP_005fCOLOR-environment-variable'><span><code>GREP_COLOR</code><a
href='#index-GREP_005fCOLOR-environment-variable' class='copiable-anchor'>
¶</a></span></dt>
+<dd><span id="index-highlight-markers"></span>
+<p>This obsolescent variable interacts with <code>GREP_COLORS</code>
+confusingly, and <code>grep</code> warns if it is set and is not
+overridden by <code>GREP_COLORS</code>. Instead of
+‘<samp>GREP_COLOR='<var>color</var>'</samp>’, you can use
+‘<samp>GREP_COLORS='mt=<var>color</var>'</samp>’.
+</p>
+</dd>
+<dt
id='index-GREP_005fCOLORS-environment-variable'><span><code>GREP_COLORS</code><a
href='#index-GREP_005fCOLORS-environment-variable' class='copiable-anchor'>
¶</a></span></dt>
+<dd><span id="index-highlight-markers-1"></span>
+<p>This variable specifies the colors and other attributes
+used to highlight various parts of the output.
+Its value is a colon-separated list of <code>terminfo</code> capabilities
+that defaults to
‘<samp>ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36</samp>’
+with the ‘<samp>rv</samp>’ and ‘<samp>ne</samp>’
boolean capabilities omitted (i.e., false).
+The two-letter capability names
+refer to terminal “capabilities,” the ability
+of a terminal to highlight text, or change its color, and so on.
+These capabilities are stored in an online database and accessed by
+the <code>terminfo</code> library.
+Non-empty capability values
control highlighting using
Select Graphic Rendition (SGR)
commands interpreted by the terminal or terminal emulator.
@@ -976,37 +1003,7 @@
and ‘<samp>48;5;0</samp>’ to ‘<samp>48;5;255</samp>’
for 88-color and 256-color modes background colors.
</p>
-<p>The two-letter names used in the <code>GREP_COLORS</code> environment
variable
-(and some of the others) refer to terminal “capabilities,” the
ability
-of a terminal to highlight text, or change its color, and so on.
-These capabilities are stored in an online database and accessed by
-the <code>terminfo</code> library.
-</p>
-<span id="index-environment-variables"></span>
-
-<dl compact="compact">
-<dt
id='index-GREP_005fCOLOR-environment-variable'><span><code>GREP_COLOR</code><a
href='#index-GREP_005fCOLOR-environment-variable' class='copiable-anchor'>
¶</a></span></dt>
-<dd><span id="index-highlight-markers"></span>
-<p>This variable specifies the color used to highlight matched (non-empty)
text.
-It is deprecated in favor of <code>GREP_COLORS</code>, but still supported.
-The ‘<samp>mt</samp>’, ‘<samp>ms</samp>’, and
‘<samp>mc</samp>’ capabilities of <code>GREP_COLORS</code>
-have priority over it.
-It can only specify the color used to highlight
-the matching non-empty text in any matching line
-(a selected line when the <samp>-v</samp> command-line option is omitted,
-or a context line when <samp>-v</samp> is specified).
-The default is ‘<samp>01;31</samp>’,
-which means a bold red foreground text on the terminal’s default
background.
-</p>
-</dd>
-<dt
id='index-GREP_005fCOLORS-environment-variable'><span><code>GREP_COLORS</code><a
href='#index-GREP_005fCOLORS-environment-variable' class='copiable-anchor'>
¶</a></span></dt>
-<dd><span id="index-highlight-markers-1"></span>
-<p>This variable specifies the colors and other attributes
-used to highlight various parts of the output.
-Its value is a colon-separated list of <code>terminfo</code> capabilities
-that defaults to
‘<samp>ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36</samp>’
-with the ‘<samp>rv</samp>’ and ‘<samp>ne</samp>’
boolean capabilities omitted (i.e., false).
-Supported capabilities are as follows.
+<p>Supported capabilities are as follows.
</p>
<dl compact="compact">
<dt id='index-sl-GREP_005fCOLORS-capability'><span><code>sl=</code><a
href='#index-sl-GREP_005fCOLORS-capability' class='copiable-anchor'>
¶</a></span></dt>
@@ -1116,7 +1113,7 @@
<span id="index-national-language-support"></span>
<span id="index-NLS"></span>
<p>These variables specify the locale for the <code>LC_COLLATE</code> category,
-which might affect how range expressions like ‘<samp>[a-z]</samp>’
are
+which might affect how range expressions like ‘<samp>a-z</samp>’
are
interpreted.
</p>
</dd>
@@ -1159,8 +1156,11 @@
by default,
such options are permuted to the front of the operand list
and are treated as options.
-Also, <code>POSIXLY_CORRECT</code> disables special handling of an
-invalid bracket expression. See <a
href="#invalid_002dbracket_002dexpr">invalid-bracket-expr</a>.
+</p>
+</dd>
+<dt id='index-TERM-environment-variable'><span><code>TERM</code><a
href='#index-TERM-environment-variable' class='copiable-anchor'>
¶</a></span></dt>
+<dd><p>This variable specifies the output terminal type, which can affect
+what the <samp>--color</samp> option does. See <a
href="#General-Output-Control">General Output Control</a>.
</p>
</dd>
<dt
id='index-_005fN_005fGNU_005fnonoption_005fargv_005fflags_005f-environment-variable'><span><code>_<var>N</var>_GNU_nonoption_argv_flags_</code><a
href='#index-_005fN_005fGNU_005fnonoption_005fargv_005fflags_005f-environment-variable'
class='copiable-anchor'> ¶</a></span></dt>
@@ -1269,15 +1269,6 @@
</dd>
</dl>
-<p>In addition,
-two variant programs <code>egrep</code> and <code>fgrep</code> are available.
-<code>egrep</code> is the same as ‘<samp>grep -E</samp>’.
-<code>fgrep</code> is the same as ‘<samp>grep -F</samp>’.
-Direct invocation as either
-<code>egrep</code> or <code>fgrep</code> is deprecated,
-but is provided to allow historical applications
-that rely on them to run unmodified.
-</p>
<hr>
</div>
@@ -1297,25 +1288,26 @@
three different versions of regular expression syntax:
basic (BRE), extended (ERE), and Perl-compatible (PCRE).
In GNU <code>grep</code>,
-there is no difference in available functionality between the basic and
-extended syntaxes.
+there is no difference in available functionality between basic and
+extended syntax.
In other implementations, basic regular expressions are less powerful.
The following description applies to extended regular expressions;
differences for basic regular expressions are summarized afterwards.
Perl-compatible regular expressions give additional functionality, and
-are documented in the <i>pcresyntax</i>(3) and <i>pcrepattern</i>(3) manual
+are documented in the <i>pcre2syntax</i>(3) and <i>pcre2pattern</i>(3) manual
pages, but work only if PCRE is available in the system.
</p>
<ul class="section-toc">
<li><a href="#Fundamental-Structure" accesskey="1">Fundamental
Structure</a></li>
<li><a href="#Character-Classes-and-Bracket-Expressions"
accesskey="2">Character Classes and Bracket Expressions</a></li>
-<li><a href="#The-Backslash-Character-and-Special-Expressions"
accesskey="3">The Backslash Character and Special Expressions</a></li>
+<li><a href="#Special-Backslash-Expressions" accesskey="3">Special Backslash
Expressions</a></li>
<li><a href="#Anchoring" accesskey="4">Anchoring</a></li>
<li><a href="#Back_002dreferences-and-Subexpressions"
accesskey="5">Back-references and Subexpressions</a></li>
<li><a href="#Basic-vs-Extended" accesskey="6">Basic vs Extended Regular
Expressions</a></li>
-<li><a href="#Character-Encoding" accesskey="7">Character Encoding</a></li>
-<li><a href="#Matching-Non_002dASCII" accesskey="8">Matching Non-ASCII and
Non-printable Characters</a></li>
+<li><a href="#Problematic-Expressions" accesskey="7">Problematic Regular
Expressions</a></li>
+<li><a href="#Character-Encoding" accesskey="8">Character Encoding</a></li>
+<li><a href="#Matching-Non_002dASCII" accesskey="9">Matching Non-ASCII and
Non-printable Characters</a></li>
</ul>
<hr>
<div class="section" id="Fundamental-Structure">
@@ -1396,9 +1388,10 @@
matches any string formed by concatenating two substrings
that respectively match the concatenated expressions.
</p>
-<p>Two regular expressions may be joined by the infix operator
‘<samp>|</samp>’;
-the resulting regular expression
-matches any string matching either alternate expression.
+<span id="index-alternatives-in-regular-expressions"></span>
+<p>Two regular expressions may be joined by the infix operator
‘<samp>|</samp>’.
+The resulting regular expression matches any string matching either of
+the two expressions, which are called <em>alternatives</em>.
</p>
<p>Repetition takes precedence over concatenation,
which in turn takes precedence over alternation.
@@ -1406,12 +1399,15 @@
to override these precedence rules and form a subexpression.
An unmatched ‘<samp>)</samp>’ matches just itself.
</p>
+<p>Not every character string is a valid regular expression.
+See <a href="#Problematic-Expressions">Problematic Regular Expressions</a>.
+</p>
<hr>
</div>
<div class="section" id="Character-Classes-and-Bracket-Expressions">
<div class="header">
<p>
-Next: <a href="#The-Backslash-Character-and-Special-Expressions" accesskey="n"
rel="next">The Backslash Character and Special Expressions</a>, Previous: <a
href="#Fundamental-Structure" accesskey="p" rel="prev">Fundamental
Structure</a>, Up: <a href="#Regular-Expressions" accesskey="u"
rel="up">Regular Expressions</a> [<a href="#SEC_Contents" title="Table
of contents" rel="contents">Contents</a>][<a href="#Index" title="Index"
rel="index">Index</a>]</p>
+Next: <a href="#Special-Backslash-Expressions" accesskey="n"
rel="next">Special Backslash Expressions</a>, Previous: <a
href="#Fundamental-Structure" accesskey="p" rel="prev">Fundamental
Structure</a>, Up: <a href="#Regular-Expressions" accesskey="u"
rel="up">Regular Expressions</a> [<a href="#SEC_Contents" title="Table
of contents" rel="contents">Contents</a>][<a href="#Index" title="Index"
rel="index">Index</a>]</p>
</div>
<span id="Character-Classes-and-Bracket-Expressions-1"></span><h3
class="section">3.2 Character Classes and Bracket Expressions</h3>
@@ -1439,7 +1435,7 @@
In other locales, the sorting sequence is not specified, and
‘<samp>[a-d]</samp>’ might be equivalent to
‘<samp>[abcd]</samp>’ or to
‘<samp>[aBbCcDd]</samp>’, or it might fail to match any character,
or the set of
-characters that it matches might even be erratic.
+characters that it matches might be erratic, or it might be invalid.
To obtain the traditional interpretation
of bracket expressions, you can use the ‘<samp>C</samp>’ locale by
setting the
<code>LC_ALL</code> environment variable to the value
‘<samp>C</samp>’.
@@ -1541,11 +1537,10 @@
part of the symbolic names, and must be included in addition to
the brackets delimiting the bracket expression.
</p>
-<span id="invalid_002dbracket_002dexpr"></span><p>If you mistakenly omit the
outer brackets, and search for say, ‘<samp>[:upper:]</samp>’,
+<p>If you mistakenly omit the outer brackets, and search for say,
‘<samp>[:upper:]</samp>’,
GNU <code>grep</code> prints a diagnostic and exits with status 2, on
-the assumption that you did not intend to search for the nominally
-equivalent regular expression: ‘<samp>[:epru]</samp>’.
-Set the <code>POSIXLY_CORRECT</code> environment variable to disable this
feature.
+the assumption that you did not intend to search for the
+regular expression ‘<samp>[:epru]</samp>’.
</p>
<p>Special characters lose their special meaning inside bracket expressions.
</p>
@@ -1583,7 +1578,7 @@
</dd>
<dt><span>‘<samp>-</samp>’</span></dt>
<dd><p>represents the range if it’s not first or last in a list or the
ending point
-of a range.
+of a range. To make the ‘<samp>-</samp>’ a list item, it is best
to put it last.
</p>
</dd>
<dt><span>‘<samp>^</samp>’</span></dt>
@@ -1596,12 +1591,12 @@
<hr>
</div>
-<div class="section" id="The-Backslash-Character-and-Special-Expressions">
+<div class="section" id="Special-Backslash-Expressions">
<div class="header">
<p>
Next: <a href="#Anchoring" accesskey="n" rel="next">Anchoring</a>, Previous:
<a href="#Character-Classes-and-Bracket-Expressions" accesskey="p"
rel="prev">Character Classes and Bracket Expressions</a>, Up: <a
href="#Regular-Expressions" accesskey="u" rel="up">Regular Expressions</a>
[<a href="#SEC_Contents" title="Table of contents"
rel="contents">Contents</a>][<a href="#Index" title="Index"
rel="index">Index</a>]</p>
</div>
-<span id="The-Backslash-Character-and-Special-Expressions-1"></span><h3
class="section">3.3 The Backslash Character and Special Expressions</h3>
+<span id="Special-Backslash-Expressions-1"></span><h3 class="section">3.3
Special Backslash Expressions</h3>
<span id="index-backslash"></span>
<p>The ‘<samp>\</samp>’ character followed by a special character
is a regular
@@ -1643,17 +1638,32 @@
<dd><p>Match non-whitespace, it is a synonym for
‘<samp>[^[:space:]]</samp>’.
</p>
</dd>
+<dt><span>‘<samp>\]</samp>’</span></dt>
+<dd><p>Match ‘<samp>]</samp>’.
+</p>
+</dd>
+<dt><span>‘<samp>\}</samp>’</span></dt>
+<dd><p>Match ‘<samp>}</samp>’.
+</p>
+</dd>
</dl>
<p>For example, ‘<samp>\brat\b</samp>’ matches the separate word
‘<samp>rat</samp>’,
‘<samp>\Brat\B</samp>’ matches ‘<samp>crate</samp>’
but not ‘<samp>furry rat</samp>’.
</p>
+<p>The behavior of <code>grep</code> is unspecified if a unescaped backslash
+is not followed by a special character, a nonzero digit, or a
+character in the above list. Although <code>grep</code> might issue a
+diagnostic and/or give the backslash an interpretation now, its
+behavior may change if the syntax of regular expressions is extended
+in future versions.
+</p>
<hr>
</div>
<div class="section" id="Anchoring">
<div class="header">
<p>
-Next: <a href="#Back_002dreferences-and-Subexpressions" accesskey="n"
rel="next">Back-references and Subexpressions</a>, Previous: <a
href="#The-Backslash-Character-and-Special-Expressions" accesskey="p"
rel="prev">The Backslash Character and Special Expressions</a>, Up: <a
href="#Regular-Expressions" accesskey="u" rel="up">Regular Expressions</a>
[<a href="#SEC_Contents" title="Table of contents"
rel="contents">Contents</a>][<a href="#Index" title="Index"
rel="index">Index</a>]</p>
+Next: <a href="#Back_002dreferences-and-Subexpressions" accesskey="n"
rel="next">Back-references and Subexpressions</a>, Previous: <a
href="#Special-Backslash-Expressions" accesskey="p" rel="prev">Special
Backslash Expressions</a>, Up: <a href="#Regular-Expressions" accesskey="u"
rel="up">Regular Expressions</a> [<a href="#SEC_Contents" title="Table
of contents" rel="contents">Contents</a>][<a href="#Index" title="Index"
rel="index">Index</a>]</p>
</div>
<span id="Anchoring-1"></span><h3 class="section">3.4 Anchoring</h3>
<span id="index-anchoring"></span>
@@ -1696,58 +1706,175 @@
<div class="section" id="Basic-vs-Extended">
<div class="header">
<p>
-Next: <a href="#Character-Encoding" accesskey="n" rel="next">Character
Encoding</a>, Previous: <a href="#Back_002dreferences-and-Subexpressions"
accesskey="p" rel="prev">Back-references and Subexpressions</a>, Up: <a
href="#Regular-Expressions" accesskey="u" rel="up">Regular Expressions</a>
[<a href="#SEC_Contents" title="Table of contents"
rel="contents">Contents</a>][<a href="#Index" title="Index"
rel="index">Index</a>]</p>
+Next: <a href="#Problematic-Expressions" accesskey="n" rel="next">Problematic
Regular Expressions</a>, Previous: <a
href="#Back_002dreferences-and-Subexpressions" accesskey="p"
rel="prev">Back-references and Subexpressions</a>, Up: <a
href="#Regular-Expressions" accesskey="u" rel="up">Regular Expressions</a>
[<a href="#SEC_Contents" title="Table of contents"
rel="contents">Contents</a>][<a href="#Index" title="Index"
rel="index">Index</a>]</p>
</div>
<span id="Basic-vs-Extended-Regular-Expressions"></span><h3
class="section">3.6 Basic vs Extended Regular Expressions</h3>
<span id="index-basic-regular-expressions"></span>
-<p>In basic regular expressions the characters ‘<samp>?</samp>’,
‘<samp>+</samp>’,
+<p>Basic regular expressions differ from extended regular expressions
+in the following ways:
+</p>
+<ul>
+<li> The characters ‘<samp>?</samp>’, ‘<samp>+</samp>’,
‘<samp>{</samp>’, ‘<samp>|</samp>’,
‘<samp>(</samp>’, and ‘<samp>)</samp>’ lose their
special meaning;
instead use the backslashed versions ‘<samp>\?</samp>’,
‘<samp>\+</samp>’, ‘<samp>\{</samp>’,
‘<samp>\|</samp>’, ‘<samp>\(</samp>’, and
‘<samp>\)</samp>’. Also, a backslash is needed
-before an interval expression’s closing ‘<samp>}</samp>’,
and an unmatched
-<code>\)</code> is invalid.
+before an interval expression’s closing ‘<samp>}</samp>’.
+
+</li><li> An unmatched ‘<samp>\)</samp>’ is invalid.
+
+</li><li> If an unescaped ‘<samp>^</samp>’ appears neither first,
nor directly after
+‘<samp>\(</samp>’ or ‘<samp>\|</samp>’, it is treated
like an ordinary character and
+is not an anchor.
+
+</li><li> If an unescaped ‘<samp>$</samp>’ appears neither last,
nor directly before
+‘<samp>\|</samp>’ or ‘<samp>\)</samp>’, it is treated
like an ordinary character and
+is not an anchor.
+
+</li><li> If an unescaped ‘<samp>*</samp>’ appears first, or
appears directly after
+‘<samp>\(</samp>’ or ‘<samp>\|</samp>’ or anchoring
‘<samp>^</samp>’, it is treated like an
+ordinary character and is not a repetition operator.
+</li></ul>
+
+<hr>
+</div>
+<div class="section" id="Problematic-Expressions">
+<div class="header">
+<p>
+Next: <a href="#Character-Encoding" accesskey="n" rel="next">Character
Encoding</a>, Previous: <a href="#Basic-vs-Extended" accesskey="p"
rel="prev">Basic vs Extended Regular Expressions</a>, Up: <a
href="#Regular-Expressions" accesskey="u" rel="up">Regular Expressions</a>
[<a href="#SEC_Contents" title="Table of contents"
rel="contents">Contents</a>][<a href="#Index" title="Index"
rel="index">Index</a>]</p>
+</div>
+<span id="Problematic-Regular-Expressions"></span><h3 class="section">3.7
Problematic Regular Expressions</h3>
+
+<span id="index-invalid-regular-expressions"></span>
+<span id="index-unspecified-behavior-in-regular-expressions"></span>
+<p>Some strings are <em>invalid regular expressions</em> and cause
+<code>grep</code> to issue a diagnostic and fail. For example,
‘<samp>xy\1</samp>’
+is invalid because there is no parenthesized subexpression for the
+back-reference ‘<samp>\1</samp>’ to refer to.
+</p>
+<p>Also, some regular expressions have <em>unspecified behavior</em> and
+should be avoided even if <code>grep</code> does not currently diagnose
+them. For example, ‘<samp>xy\0</samp>’ has unspecified behavior
because
+‘<samp>0</samp>’ is not a special character and
‘<samp>\0</samp>’ is not a special
+backslash expression (see <a href="#Special-Backslash-Expressions">Special
Backslash Expressions</a>).
+Unspecified behavior can be particularly problematic because the set
+of matched strings might be only partially specified, or not be
+specified at all, or the expression might even be invalid.
+</p>
+<p>The following regular expression constructs are invalid on all
+platforms conforming to POSIX, so portable scripts can assume that
+<code>grep</code> rejects these constructs:
</p>
-<p>Portable scripts should avoid the following constructs, as
-POSIX says they produce undefined results:
+<ul>
+<li> A basic regular expression containing a back-reference
‘<samp>\<var>n</var></samp>’
+preceded by fewer than <var>n</var> closing parentheses. For example,
+‘<samp>\(a\)\2</samp>’ is invalid.
+
+</li><li> A bracket expression containing ‘<samp>[:</samp>’ that
does not start a
+character class; and similarly for ‘<samp>[=</samp>’ and
‘<samp>[.</samp>’. For
+example, ‘<samp>[a[:b]</samp>’ and
‘<samp>[a[:ouch:]b]</samp>’ are invalid.
+</li></ul>
+
+<p>GNU <code>grep</code> treats the following constructs as invalid.
+However, other <code>grep</code> implementations might allow them, so
+portable scripts should not rely on their being invalid:
</p>
<ul>
-<li> Extended regular expressions that use back-references.
-</li><li> Basic regular expressions that use ‘<samp>\?</samp>’,
‘<samp>\+</samp>’, or ‘<samp>\|</samp>’.
-</li><li> Empty parenthesized regular expressions like
‘<samp>()</samp>’.
-</li><li> Empty alternatives (as in, e.g, ‘<samp>a|</samp>’).
-</li><li> Repetition operators that immediately follow empty expressions,
-unescaped ‘<samp>$</samp>’, or other repetition operators.
-</li><li> A backslash escaping an ordinary character (e.g.,
‘<samp>\S</samp>’),
-unless it is a back-reference.
-</li><li> An unescaped ‘<samp>[</samp>’ that is not part of a
bracket expression.
-</li><li> In extended regular expressions, an unescaped
‘<samp>{</samp>’ that is not
-part of an interval expression.
+<li> Unescaped ‘<samp>\</samp>’ at the end of a regular expression.
+
+</li><li> Unescaped ‘<samp>[</samp>’ that does not start a bracket
expression.
+
+</li><li> A ‘<samp>\{</samp>’ in a basic regular expression that
does not start an
+interval expression.
+
+</li><li> A basic regular expression with unbalanced
‘<samp>\(</samp>’ or ‘<samp>\)</samp>’,
+or an extended regular expression with unbalanced ‘<samp>(</samp>’.
+
+</li><li> In the POSIX locale, a range expression like
‘<samp>z-a</samp>’ that
+represents zero elements. A non-GNU <code>grep</code> might treat it as
+a valid range that never matches.
+
+</li><li> An interval expression with a repetition count greater than 32767.
+(The portable POSIX limit is 255, and even interval expressions with
+smaller counts can be impractically slow on all known implementations.)
+
+</li><li> A bracket expression that contains at least three elements, the first
+and last of which are both ‘<samp>:</samp>’, or both
‘<samp>.</samp>’, or both
+‘<samp>=</samp>’. For example, a non-GNU <code>grep</code> might
treat
+‘<samp>[:alpha:]</samp>’ like
‘<samp>[[:alpha:]]</samp>’, or like
‘<samp>[:ahlp]</samp>’.
</li></ul>
-<span id="index-interval-expressions-1"></span>
-<p>Traditional <code>egrep</code> did not support interval expressions and
-some <code>egrep</code> implementations use ‘<samp>\{</samp>’ and
‘<samp>\}</samp>’ instead, so
-portable scripts should avoid interval expressions in
‘<samp>grep -E</samp>’ patterns
-and should use ‘<samp>[{]</samp>’ to match a literal
‘<samp>{</samp>’.
-</p>
-<p>GNU <code>grep -E</code> attempts to support traditional usage by
-assuming that ‘<samp>{</samp>’ is not special if it would be the
start of an
-invalid interval expression.
-For example, the command
-‘<samp>grep -E '{1'</samp>’ searches for the
two-character string ‘<samp>{1</samp>’
-instead of reporting a syntax error in the regular expression.
-POSIX allows this behavior as an extension, but portable scripts
-should avoid it.
+<p>The following constructs have well-defined behavior in GNU
+<code>grep</code>. However, they have unspecified behavior elsewhere, so
+portable scripts should avoid them:
+</p>
+<ul>
+<li> Special backslash expressions like ‘<samp>\b</samp>’,
‘<samp>\<</samp>’, and ‘<samp>\]</samp>’.
+See <a href="#Special-Backslash-Expressions">Special Backslash Expressions</a>.
+
+</li><li> A basic regular expression that uses ‘<samp>\?</samp>’,
‘<samp>\+</samp>’, or ‘<samp>\|</samp>’.
+
+</li><li> An extended regular expression that uses back-references.
+
+</li><li> An empty regular expression, subexpression, or alternative. For
+example, ‘<samp>(a|bc|)</samp>’ is not portable; a portable
equivalent is
+‘<samp>(a|bc)?</samp>’.
+
+</li><li> In a basic regular expression, an anchoring
‘<samp>^</samp>’ that appears
+directly after ‘<samp>\(</samp>’, or an anchoring
‘<samp>$</samp>’ that appears
+directly before ‘<samp>\)</samp>’.
+
+</li><li> In a basic regular expression, a repetition operator that
+directly follows another repetition operator.
+
+</li><li> In an extended regular expression, unescaped
‘<samp>{</samp>’
+that does not begin a valid interval expression.
+GNU <code>grep</code> treats the ‘<samp>{</samp>’ as an ordinary
character.
+
+</li><li> A null character or an encoding error in either pattern or input
data.
+See <a href="#Character-Encoding">Character Encoding</a>.
+
+</li><li> An input file that ends in a non-newline character,
+where GNU <code>grep</code> silently supplies a newline.
+</li></ul>
+
+<p>The following constructs have unspecified behavior, in both GNU
+and other <code>grep</code> implementations. Scripts should avoid
+them whenever possible.
</p>
+<ul>
+<li> A backslash escaping an ordinary character, unless it is a
+back-reference like ‘<samp>\1</samp>’ or a special backslash
expression like
+‘<samp>\<</samp>’ or ‘<samp>\b</samp>’. See <a
href="#Special-Backslash-Expressions">Special Backslash Expressions</a>. For
+example, ‘<samp>\x</samp>’ has unspecified behavior now, and a
future version
+of <code>grep</code> might specify ‘<samp>\x</samp>’ to have a new
behavior.
+
+</li><li> A repetition operator that appears directly after an anchor, or at
the
+start of a complete regular expression, parenthesized subexpression,
+or alternative. For example, ‘<samp>+|^*(+a|?-b)</samp>’ has
unspecified
+behavior, whereas ‘<samp>\+|^\*(\+a|\?-b)</samp>’ is portable.
+
+</li><li> A range expression outside the POSIX locale. For example, in some
+locales ‘<samp>[a-z]</samp>’ might match some characters that are
not
+lowercase letters, or might not match some lowercase letters, or might
+be invalid. With GNU <code>grep</code> it is not documented whether
+these range expressions use native code points, or use the collating
+sequence specified by the <code>LC_COLLATE</code> category, or have some
+other interpretation. Outside the POSIX locale, it is portable to use
+‘<samp>[[:lower:]]</samp>’ to match a lower-case letter, or
+‘<samp>[abcdefghijklmnopqrstuvwxyz]</samp>’ to match an ASCII
lower-case
+letter.
+
+</li></ul>
+
<hr>
</div>
<div class="section" id="Character-Encoding">
<div class="header">
<p>
-Next: <a href="#Matching-Non_002dASCII" accesskey="n" rel="next">Matching
Non-ASCII and Non-printable Characters</a>, Previous: <a
href="#Basic-vs-Extended" accesskey="p" rel="prev">Basic vs Extended Regular
Expressions</a>, Up: <a href="#Regular-Expressions" accesskey="u"
rel="up">Regular Expressions</a> [<a href="#SEC_Contents" title="Table
of contents" rel="contents">Contents</a>][<a href="#Index" title="Index"
rel="index">Index</a>]</p>
+Next: <a href="#Matching-Non_002dASCII" accesskey="n" rel="next">Matching
Non-ASCII and Non-printable Characters</a>, Previous: <a
href="#Problematic-Expressions" accesskey="p" rel="prev">Problematic Regular
Expressions</a>, Up: <a href="#Regular-Expressions" accesskey="u"
rel="up">Regular Expressions</a> [<a href="#SEC_Contents" title="Table
of contents" rel="contents">Contents</a>][<a href="#Index" title="Index"
rel="index">Index</a>]</p>
</div>
-<span id="Character-Encoding-1"></span><h3 class="section">3.7 Character
Encoding</h3>
+<span id="Character-Encoding-1"></span><h3 class="section">3.8 Character
Encoding</h3>
<span id="index-character-encoding"></span>
<p>The <code>LC_CTYPE</code> locale specifies the encoding of characters in
@@ -1780,7 +1907,7 @@
<p>
Previous: <a href="#Character-Encoding" accesskey="p" rel="prev">Character
Encoding</a>, Up: <a href="#Regular-Expressions" accesskey="u" rel="up">Regular
Expressions</a> [<a href="#SEC_Contents" title="Table of contents"
rel="contents">Contents</a>][<a href="#Index" title="Index"
rel="index">Index</a>]</p>
</div>
-<span id="Matching-Non_002dASCII-and-Non_002dprintable-Characters"></span><h3
class="section">3.8 Matching Non-ASCII and Non-printable Characters</h3>
+<span id="Matching-Non_002dASCII-and-Non_002dprintable-Characters"></span><h3
class="section">3.9 Matching Non-ASCII and Non-printable Characters</h3>
<span id="index-non_002dASCII-matching"></span>
<span id="index-non_002dprintable-matching"></span>
@@ -1909,24 +2036,36 @@
</pre></div>
</li><li> What if a pattern or file has a leading ‘<samp>-</samp>’?
+For example:
<div class="example">
-<pre class="example">grep -- '--cut here--' *
+<pre class="example">grep "$pattern" *
</pre></div>
-<p>searches for all lines matching ‘<samp>--cut here--</samp>’.
-Without <samp>--</samp>,
-<code>grep</code> would attempt to parse ‘<samp>--cut
here--</samp>’ as a list of
-options, and there would be similar problems with any file names
-beginning with ‘<samp>-</samp>’.
+<p>can behave unexpectedly if the value of ‘<samp>pattern</samp>’
begins with ‘<samp>-</samp>’,
+or if the ‘<samp>*</samp>’ expands to a file name with leading
‘<samp>-</samp>’.
+To avoid the problem, you can use <samp>-e</samp> for patterns and leading
+‘<samp>./</samp>’ for files:
</p>
-<p>Alternatively, you can prevent misinterpretation of leading
‘<samp>-</samp>’
-by using <samp>-e</samp> for patterns and leading
‘<samp>./</samp>’ for files:
+<div class="example">
+<pre class="example">grep -e "$pattern" ./*
+</pre></div>
+
+<p>searches for all lines matching the pattern in all the working
+directory’s files whose names do not begin with
‘<samp>.</samp>’.
+Without the <samp>-e</samp>, <code>grep</code> might treat the pattern as an
+option if it begins with ‘<samp>-</samp>’. Without the
‘<samp>./</samp>’, there might
+be similar problems with file names beginning with
‘<samp>-</samp>’.
+</p>
+<p>Alternatively, you can use ‘<samp>--</samp>’ before the pattern
and file names:
</p>
<div class="example">
-<pre class="example">grep -e '--cut here--' ./*
+<pre class="example">grep -- "$pattern" *
</pre></div>
+<p>This also fixes the problem, except that if there is a file named
‘<samp>-</samp>’,
+<code>grep</code> misinterprets the ‘<samp>-</samp>’ as standard
input.
+</p>
</li><li> Suppose I want to search for a whole word, not a part of a word?
<div class="example">
@@ -2000,8 +2139,7 @@
<samp>-a</samp> or ‘<samp>--binary-files=text</samp>’ option.
To eliminate the
“Binary file matches” messages, use the <samp>-I</samp> or
-‘<samp>--binary-files=without-match</samp>’ option,
-or the <samp>-s</samp> or <samp>--no-messages</samp> option.
+‘<samp>--binary-files=without-match</samp>’ option.
</p>
</li><li> Why doesn’t ‘<samp>grep -lv</samp>’ print
non-matching file names?
@@ -2029,7 +2167,10 @@
</p>
<p>To match empty lines, use the pattern ‘<samp>^$</samp>’. To
match blank
lines, use the pattern ‘<samp>^[[:blank:]]*$</samp>’. To match no
lines at
-all, use the command ‘<samp>grep -f /dev/null</samp>’.
+all, use an extended regular expression like ‘<samp>a^</samp>’ or
‘<samp>$a</samp>’.
+To match every line, a portable script should use a pattern like
+‘<samp>^</samp>’ instead of the empty pattern, as POSIX does not
specify the
+behavior of the empty pattern.
</p>
</li><li> How can I search in both standard input and in files?
@@ -2039,6 +2180,21 @@
<pre class="example">cat /etc/passwd | grep 'alain' - /etc/motd
</pre></div>
+</li><li> Why can’t I combine the shell’s ‘<samp>set
-e</samp>’ with <code>grep</code>?
+
+<p>The <code>grep</code> command follows the convention of programs like
+<code>cmp</code> and <code>diff</code> where an exit status of 1 is not an
+error. The shell command ‘<samp>set -e</samp>’ causes the shell
to exit if
+any subcommand exits with nonzero status, and this will cause the
+shell to exit merely because <code>grep</code> selected no lines,
+which is ordinarily not what you want.
+</p>
+<p>There is a related problem with Bash’s <code>set -e -o
pipefail</code>.
+Since <code>grep</code> does not always read all its input, a command
+outputting to a pipe read by <code>grep</code> can fail when
+<code>grep</code> exits before reading all its input, and the command’s
+failure can cause Bash to exit.
+</p>
</li><li> Why is this back-reference failing?
<div class="example">
@@ -2069,7 +2225,7 @@
<code>sed</code>, <code>perl</code>, or many other utilities that are
designed to operate across lines.
</p>
-</li><li> What do <code>grep</code>, <code>fgrep</code>, and
<code>egrep</code> stand for?
+</li><li> What do <code>grep</code>, <samp>-E</samp>, and <samp>-F</samp>
stand for?
<p>The name <code>grep</code> comes from the way line editing was done on Unix.
For example,
@@ -2081,9 +2237,29 @@
g/re/p
</pre></div>
-<p><code>fgrep</code> stands for Fixed <code>grep</code>;
-<code>egrep</code> stands for Extended <code>grep</code>.
+<p>The <samp>-E</samp> option stands for Extended <code>grep</code>.
+The <samp>-F</samp> option stands for Fixed <code>grep</code>;
+</p>
+</li><li> What happened to <code>egrep</code> and <code>fgrep</code>?
+
+<p>7th Edition Unix had commands <code>egrep</code> and <code>fgrep</code>
+that were the counterparts of the modern ‘<samp>grep -E</samp>’
and ‘<samp>grep -F</samp>’.
+Although breaking up <code>grep</code> into three programs was perhaps
+useful on the small computers of the 1970s, <code>egrep</code> and
+<code>fgrep</code> were not standardized by POSIX and are no longer needed.
+In the current GNU implementation, <code>egrep</code> and <code>fgrep</code>
+issue a warning and then act like their modern counterparts;
+eventually, they are planned to be removed entirely.
+</p>
+<p>If you prefer the old names, you can use use your own substitutes,
+such as a shell script named <code>egrep</code> with the following
+contents:
</p>
+<div class="example">
+<pre class="example">#!/bin/sh
+exec grep -E "$@"
+</pre></div>
+
</li></ol>
@@ -2125,6 +2301,17 @@
surprisingly inefficient due to difficulties in fast portable access to
concepts like multi-character collating elements.
</p>
+<span id="index-interval-expressions-1"></span>
+<p>Interval expressions may be implemented internally via repetition.
+For example, ‘<samp>^(a|bc){2,4}$</samp>’ might be implemented as
+‘<samp>^(a|bc)(a|bc)((a|bc)(a|bc)?)?$</samp>’. A large repetition
count may
+exhaust memory or greatly slow matching. Even small counts can cause
+problems if cascaded; for example, ‘<samp>grep -E
+".*{10,}{10,}{10,}{10,}{10,}"</samp>’ is likely to overflow a
+stack. Fortunately, regular expressions like these are typically
+artificial, and cascaded repetitions do not conform to POSIX so cannot
+be used in portable programs anyway.
+</p>
<span id="index-back_002dreferences"></span>
<p>A back-reference such as ‘<samp>\1</samp>’ can hurt performance
significantly
in some cases, since back-references cannot in general be implemented
@@ -2145,6 +2332,14 @@
<samp>-a</samp> (<samp>--binary-files=text</samp>) option is used (see <a
href="#File-and-Directory-Selection">File and Directory Selection</a>), unless
the <samp>-z</samp> (<samp>--null-data</samp>)
option is also used (see <a href="#Other-Options">Other Options</a>).
</p>
+<span id="index-pipelines-and-reading"></span>
+<p>For efficiency <code>grep</code> does not always read all its input.
+For example, the shell command ‘<samp>sed '/^...$/d' | grep -q
X</samp>’ can
+cause <code>grep</code> to exit immediately after reading a line
+containing ‘<samp>X</samp>’, without bothering to read the rest of
its input data.
+This in turn can cause <code>sed</code> to exit with a nonzero status because
+<code>sed</code> cannot write to its output pipe after <code>grep</code> exits.
+</p>
<p>For more about the algorithms used by <code>grep</code> and about
related string matching algorithms, see:
</p>
@@ -2157,20 +2352,33 @@
</li><li> Aho AV, Corasick MJ. Efficient string matching: an aid to
bibliographic search.
<em>CACM</em>. 1975;18(6):333–40.
-<a
href="https://dx.doi.org/10.1145/360825.360855">https://dx.doi.org/10.1145/360825.360855</a>.
+<a
href="https://doi.org/10.1145/360825.360855">https://doi.org/10.1145/360825.360855</a>.
This introduces the Aho–Corasick algorithm.
</li><li> Boyer RS, Moore JS. A fast string searching algorithm.
<em>CACM</em>. 1977;20(10):762–72.
-<a
href="https://dx.doi.org/10.1145/359842.359859">https://dx.doi.org/10.1145/359842.359859</a>.
+<a
href="https://doi.org/10.1145/359842.359859">https://doi.org/10.1145/359842.359859</a>.
This introduces the Boyer–Moore algorithm.
</li><li> Faro S, Lecroq T. The exact online string matching problem: a review
of the most recent results.
<em>ACM Comput Surv</em>. 2013;45(2):13.
-<a
href="https://dx.doi.org/10.1145/2431211.2431212">https://dx.doi.org/10.1145/2431211.2431212</a>.
+<a
href="https://doi.org/10.1145/2431211.2431212">https://doi.org/10.1145/2431211.2431212</a>.
This surveys string matching algorithms that might help improve the
performance of <code>grep</code> in the future.
+
+</li><li> Hakak SI, Kamsin A, Shivakumara P, Gilkar GA, Khan WZ, Imran M.
+Exact string matching algorithms: survey issues, and future research
directions.
+<em>IEEE Access</em>. 2019;7:69614–37.
+<a
href="https://doi.org/10.1109/ACCESS.2019.2914071">https://doi.org/10.1109/ACCESS.2019.2914071</a>.
+This survey is more recent than Faro & Lecroq,
+and focuses on taxonomy instead of performance.
+
+</li><li> Hume A, Sunday D. Fast string search.
+<em>Software Pract Exper</em>. 1991;21(11):1221–48.
+<a
href="https://doi.org/10.1002/spe.4380211105">https://doi.org/10.1002/spe.4380211105</a>.
+This excellent albeit now-dated survey aided the initial development
+of <code>grep</code>.
</li></ul>
<hr>
@@ -2928,13 +3136,14 @@
<tr><td></td><td valign="top"><a
href="#index-alpha-character-class"><code>alpha <span class="roman">character
class</span></code></a>:</td><td> </td><td valign="top"><a
href="#Character-Classes-and-Bracket-Expressions">Character Classes and Bracket
Expressions</a></td></tr>
<tr><td></td><td valign="top"><a
href="#index-alphabetic-characters">alphabetic
characters</a>:</td><td> </td><td valign="top"><a
href="#Character-Classes-and-Bracket-Expressions">Character Classes and Bracket
Expressions</a></td></tr>
<tr><td></td><td valign="top"><a
href="#index-alphanumeric-characters">alphanumeric
characters</a>:</td><td> </td><td valign="top"><a
href="#Character-Classes-and-Bracket-Expressions">Character Classes and Bracket
Expressions</a></td></tr>
+<tr><td></td><td valign="top"><a
href="#index-alternatives-in-regular-expressions">alternatives in regular
expressions</a>:</td><td> </td><td valign="top"><a
href="#Fundamental-Structure">Fundamental Structure</a></td></tr>
<tr><td></td><td valign="top"><a
href="#index-anchoring">anchoring</a>:</td><td> </td><td valign="top"><a
href="#Anchoring">Anchoring</a></td></tr>
<tr><td></td><td valign="top"><a
href="#index-asterisk">asterisk</a>:</td><td> </td><td valign="top"><a
href="#Fundamental-Structure">Fundamental Structure</a></td></tr>
<tr><td colspan="4"> <hr></td></tr>
<tr><th id="Index_cp_letter-B">B</th><td></td><td></td></tr>
<tr><td></td><td valign="top"><a
href="#index-back_002dreference">back-reference</a>:</td><td> </td><td
valign="top"><a href="#Back_002dreferences-and-Subexpressions">Back-references
and Subexpressions</a></td></tr>
<tr><td></td><td valign="top"><a
href="#index-back_002dreferences">back-references</a>:</td><td> </td><td
valign="top"><a href="#Performance">Performance</a></td></tr>
-<tr><td></td><td valign="top"><a
href="#index-backslash">backslash</a>:</td><td> </td><td valign="top"><a
href="#The-Backslash-Character-and-Special-Expressions">The Backslash Character
and Special Expressions</a></td></tr>
+<tr><td></td><td valign="top"><a
href="#index-backslash">backslash</a>:</td><td> </td><td valign="top"><a
href="#Special-Backslash-Expressions">Special Backslash
Expressions</a></td></tr>
<tr><td></td><td valign="top"><a href="#index-basic-regular-expressions">basic
regular expressions</a>:</td><td> </td><td valign="top"><a
href="#Basic-vs-Extended">Basic vs Extended</a></td></tr>
<tr><td></td><td valign="top"><a href="#index-before-context">before
context</a>:</td><td> </td><td valign="top"><a
href="#Context-Line-Control">Context Line Control</a></td></tr>
<tr><td></td><td valign="top"><a href="#index-binary-files">binary
files</a>:</td><td> </td><td valign="top"><a
href="#File-and-Directory-Selection">File and Directory Selection</a></td></tr>
@@ -3012,7 +3221,8 @@
<tr><th id="Index_cp_letter-I">I</th><td></td><td></td></tr>
<tr><td></td><td valign="top"><a href="#index-include-files">include
files</a>:</td><td> </td><td valign="top"><a
href="#File-and-Directory-Selection">File and Directory Selection</a></td></tr>
<tr><td></td><td valign="top"><a href="#index-interval-expressions">interval
expressions</a>:</td><td> </td><td valign="top"><a
href="#Fundamental-Structure">Fundamental Structure</a></td></tr>
-<tr><td></td><td valign="top"><a href="#index-interval-expressions-1">interval
expressions</a>:</td><td> </td><td valign="top"><a
href="#Basic-vs-Extended">Basic vs Extended</a></td></tr>
+<tr><td></td><td valign="top"><a href="#index-interval-expressions-1">interval
expressions</a>:</td><td> </td><td valign="top"><a
href="#Performance">Performance</a></td></tr>
+<tr><td></td><td valign="top"><a
href="#index-invalid-regular-expressions">invalid regular
expressions</a>:</td><td> </td><td valign="top"><a
href="#Problematic-Expressions">Problematic Expressions</a></td></tr>
<tr><td></td><td valign="top"><a href="#index-invert-matching">invert
matching</a>:</td><td> </td><td valign="top"><a
href="#Matching-Control">Matching Control</a></td></tr>
<tr><td colspan="4"> <hr></td></tr>
<tr><th id="Index_cp_letter-L">L</th><td></td><td></td></tr>
@@ -3081,6 +3291,7 @@
<tr><td></td><td valign="top"><a href="#index-patterns-option">patterns
option</a>:</td><td> </td><td valign="top"><a
href="#Matching-Control">Matching Control</a></td></tr>
<tr><td></td><td valign="top"><a
href="#index-performance">performance</a>:</td><td> </td><td
valign="top"><a href="#Performance">Performance</a></td></tr>
<tr><td></td><td valign="top"><a
href="#index-period">period</a>:</td><td> </td><td valign="top"><a
href="#Fundamental-Structure">Fundamental Structure</a></td></tr>
+<tr><td></td><td valign="top"><a href="#index-pipelines-and-reading">pipelines
and reading</a>:</td><td> </td><td valign="top"><a
href="#Performance">Performance</a></td></tr>
<tr><td></td><td valign="top"><a href="#index-plus-sign">plus
sign</a>:</td><td> </td><td valign="top"><a
href="#Fundamental-Structure">Fundamental Structure</a></td></tr>
<tr><td></td><td valign="top"><a
href="#index-POSIXLY_005fCORRECT-environment-variable"><code>POSIXLY_CORRECT
<span class="roman">environment
variable</span></code></a>:</td><td> </td><td valign="top"><a
href="#Environment-Variables">Environment Variables</a></td></tr>
<tr><td></td><td valign="top"><a
href="#index-print-character-class"><code>print <span class="roman">character
class</span></code></a>:</td><td> </td><td valign="top"><a
href="#Character-Classes-and-Bracket-Expressions">Character Classes and Bracket
Expressions</a></td></tr>
@@ -3121,9 +3332,11 @@
<tr><td colspan="4"> <hr></td></tr>
<tr><th id="Index_cp_letter-T">T</th><td></td><td></td></tr>
<tr><td></td><td valign="top"><a
href="#index-tab_002daligned-content-lines">tab-aligned content
lines</a>:</td><td> </td><td valign="top"><a
href="#Output-Line-Prefix-Control">Output Line Prefix Control</a></td></tr>
+<tr><td></td><td valign="top"><a
href="#index-TERM-environment-variable"><code>TERM <span
class="roman">environment variable</span></code></a>:</td><td> </td><td
valign="top"><a href="#Environment-Variables">Environment
Variables</a></td></tr>
<tr><td></td><td valign="top"><a
href="#index-translation-of-message-language">translation of message
language</a>:</td><td> </td><td valign="top"><a
href="#Environment-Variables">Environment Variables</a></td></tr>
<tr><td colspan="4"> <hr></td></tr>
<tr><th id="Index_cp_letter-U">U</th><td></td><td></td></tr>
+<tr><td></td><td valign="top"><a
href="#index-unspecified-behavior-in-regular-expressions">unspecified behavior
in regular expressions</a>:</td><td> </td><td valign="top"><a
href="#Problematic-Expressions">Problematic Expressions</a></td></tr>
<tr><td></td><td valign="top"><a
href="#index-upper-character-class"><code>upper <span class="roman">character
class</span></code></a>:</td><td> </td><td valign="top"><a
href="#Character-Classes-and-Bracket-Expressions">Character Classes and Bracket
Expressions</a></td></tr>
<tr><td></td><td valign="top"><a
href="#index-upper_002dcase-letters">upper-case
letters</a>:</td><td> </td><td valign="top"><a
href="#Character-Classes-and-Bracket-Expressions">Character Classes and Bracket
Expressions</a></td></tr>
<tr><td></td><td valign="top"><a
href="#index-usage-summary_002c-printing">usage summary,
printing</a>:</td><td> </td><td valign="top"><a
href="#Generic-Program-Information">Generic Program Information</a></td></tr>
@@ -3212,14 +3425,6 @@
</div>
</div>
-<div class="footnote">
-<hr>
-<h4 class="footnotes-heading">Footnotes</h4>
-
-<h5><a id="FOOT1" href="#DOCF1">(1)</a></h5>
-<p>Of course, 7th Edition
-Unix predated POSIX by several years!</p>
-</div>
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Changes to grep/manual/grep.html,v,
Jim Meyering <=