[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Changes to grep/manual/html_node/Performance.html,v
From: |
Jim Meyering |
Subject: |
Changes to grep/manual/html_node/Performance.html,v |
Date: |
Sat, 3 Sep 2022 15:33:16 -0400 (EDT) |
CVSROOT: /webcvs/grep
Module name: grep
Changes by: Jim Meyering <meyering> 22/09/03 15:33:15
Index: html_node/Performance.html
===================================================================
RCS file: /webcvs/grep/grep/manual/html_node/Performance.html,v
retrieving revision 1.4
retrieving revision 1.5
diff -u -b -r1.4 -r1.5
--- html_node/Performance.html 14 Aug 2021 20:46:41 -0000 1.4
+++ html_node/Performance.html 3 Sep 2022 19:33:14 -0000 1.5
@@ -5,7 +5,7 @@
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<!-- This manual is for grep, a pattern matching engine.
-Copyright (C) 1999-2002, 2005, 2008-2021 Free Software Foundation,
+Copyright (C) 1999-2002, 2005, 2008-2022 Free Software Foundation,
Inc.
Permission is granted to copy, distribute and/or modify this document
@@ -14,10 +14,10 @@
Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
Texts. A copy of the license is included in the section entitled
"GNU Free Documentation License". -->
-<title>Performance (GNU Grep 3.7)</title>
+<title>Performance (GNU Grep 3.8)</title>
-<meta name="description" content="Performance (GNU Grep 3.7)">
-<meta name="keywords" content="Performance (GNU Grep 3.7)">
+<meta name="description" content="Performance (GNU Grep 3.8)">
+<meta name="keywords" content="Performance (GNU Grep 3.8)">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
@@ -91,6 +91,17 @@
surprisingly inefficient due to difficulties in fast portable access to
concepts like multi-character collating elements.
</p>
+<span id="index-interval-expressions-1"></span>
+<p>Interval expressions may be implemented internally via repetition.
+For example, ‘<samp>^(a|bc){2,4}$</samp>’ might be implemented as
+‘<samp>^(a|bc)(a|bc)((a|bc)(a|bc)?)?$</samp>’. A large repetition
count may
+exhaust memory or greatly slow matching. Even small counts can cause
+problems if cascaded; for example, ‘<samp>grep -E
+".*{10,}{10,}{10,}{10,}{10,}"</samp>’ is likely to overflow a
+stack. Fortunately, regular expressions like these are typically
+artificial, and cascaded repetitions do not conform to POSIX so cannot
+be used in portable programs anyway.
+</p>
<span id="index-back_002dreferences"></span>
<p>A back-reference such as ‘<samp>\1</samp>’ can hurt performance
significantly
in some cases, since back-references cannot in general be implemented
@@ -111,6 +122,14 @@
<samp>-a</samp> (<samp>--binary-files=text</samp>) option is used (see <a
href="File-and-Directory-Selection.html">File and Directory Selection</a>),
unless the <samp>-z</samp> (<samp>--null-data</samp>)
option is also used (see <a href="Other-Options.html">Other Options</a>).
</p>
+<span id="index-pipelines-and-reading"></span>
+<p>For efficiency <code>grep</code> does not always read all its input.
+For example, the shell command ‘<samp>sed '/^...$/d' | grep -q
X</samp>’ can
+cause <code>grep</code> to exit immediately after reading a line
+containing ‘<samp>X</samp>’, without bothering to read the rest of
its input data.
+This in turn can cause <code>sed</code> to exit with a nonzero status because
+<code>sed</code> cannot write to its output pipe after <code>grep</code> exits.
+</p>
<p>For more about the algorithms used by <code>grep</code> and about
related string matching algorithms, see:
</p>
@@ -123,20 +142,33 @@
</li><li> Aho AV, Corasick MJ. Efficient string matching: an aid to
bibliographic search.
<em>CACM</em>. 1975;18(6):333–40.
-<a
href="https://dx.doi.org/10.1145/360825.360855">https://dx.doi.org/10.1145/360825.360855</a>.
+<a
href="https://doi.org/10.1145/360825.360855">https://doi.org/10.1145/360825.360855</a>.
This introduces the Aho–Corasick algorithm.
</li><li> Boyer RS, Moore JS. A fast string searching algorithm.
<em>CACM</em>. 1977;20(10):762–72.
-<a
href="https://dx.doi.org/10.1145/359842.359859">https://dx.doi.org/10.1145/359842.359859</a>.
+<a
href="https://doi.org/10.1145/359842.359859">https://doi.org/10.1145/359842.359859</a>.
This introduces the Boyer–Moore algorithm.
</li><li> Faro S, Lecroq T. The exact online string matching problem: a review
of the most recent results.
<em>ACM Comput Surv</em>. 2013;45(2):13.
-<a
href="https://dx.doi.org/10.1145/2431211.2431212">https://dx.doi.org/10.1145/2431211.2431212</a>.
+<a
href="https://doi.org/10.1145/2431211.2431212">https://doi.org/10.1145/2431211.2431212</a>.
This surveys string matching algorithms that might help improve the
performance of <code>grep</code> in the future.
+
+</li><li> Hakak SI, Kamsin A, Shivakumara P, Gilkar GA, Khan WZ, Imran M.
+Exact string matching algorithms: survey issues, and future research
directions.
+<em>IEEE Access</em>. 2019;7:69614–37.
+<a
href="https://doi.org/10.1109/ACCESS.2019.2914071">https://doi.org/10.1109/ACCESS.2019.2914071</a>.
+This survey is more recent than Faro & Lecroq,
+and focuses on taxonomy instead of performance.
+
+</li><li> Hume A, Sunday D. Fast string search.
+<em>Software Pract Exper</em>. 1991;21(11):1221–48.
+<a
href="https://doi.org/10.1002/spe.4380211105">https://doi.org/10.1002/spe.4380211105</a>.
+This excellent albeit now-dated survey aided the initial development
+of <code>grep</code>.
</li></ul>
</div>
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Changes to grep/manual/html_node/Performance.html,v,
Jim Meyering <=