grep-commit
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Changes to grep/manual/html_node/Performance.html,v


From: Jim Meyering
Subject: Changes to grep/manual/html_node/Performance.html,v
Date: Sat, 3 Sep 2022 15:33:16 -0400 (EDT)

CVSROOT:        /webcvs/grep
Module name:    grep
Changes by:     Jim Meyering <meyering> 22/09/03 15:33:15

Index: html_node/Performance.html
===================================================================
RCS file: /webcvs/grep/grep/manual/html_node/Performance.html,v
retrieving revision 1.4
retrieving revision 1.5
diff -u -b -r1.4 -r1.5
--- html_node/Performance.html  14 Aug 2021 20:46:41 -0000      1.4
+++ html_node/Performance.html  3 Sep 2022 19:33:14 -0000       1.5
@@ -5,7 +5,7 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
 <!-- This manual is for grep, a pattern matching engine.
 
-Copyright (C) 1999-2002, 2005, 2008-2021 Free Software Foundation,
+Copyright (C) 1999-2002, 2005, 2008-2022 Free Software Foundation,
 Inc.
 
 Permission is granted to copy, distribute and/or modify this document
@@ -14,10 +14,10 @@
 Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
 Texts.  A copy of the license is included in the section entitled
 "GNU Free Documentation License". -->
-<title>Performance (GNU Grep 3.7)</title>
+<title>Performance (GNU Grep 3.8)</title>
 
-<meta name="description" content="Performance (GNU Grep 3.7)">
-<meta name="keywords" content="Performance (GNU Grep 3.7)">
+<meta name="description" content="Performance (GNU Grep 3.8)">
+<meta name="keywords" content="Performance (GNU Grep 3.8)">
 <meta name="resource-type" content="document">
 <meta name="distribution" content="global">
 <meta name="Generator" content="makeinfo">
@@ -91,6 +91,17 @@
 surprisingly inefficient due to difficulties in fast portable access to
 concepts like multi-character collating elements.
 </p>
+<span id="index-interval-expressions-1"></span>
+<p>Interval expressions may be implemented internally via repetition.
+For example, &lsquo;<samp>^(a|bc){2,4}$</samp>&rsquo; might be implemented as
+&lsquo;<samp>^(a|bc)(a|bc)((a|bc)(a|bc)?)?$</samp>&rsquo;.  A large repetition 
count may
+exhaust memory or greatly slow matching.  Even small counts can cause
+problems if cascaded; for example, &lsquo;<samp>grep -E
+&quot;.*{10,}{10,}{10,}{10,}{10,}&quot;</samp>&rsquo; is likely to overflow a
+stack.  Fortunately, regular expressions like these are typically
+artificial, and cascaded repetitions do not conform to POSIX so cannot
+be used in portable programs anyway.
+</p>
 <span id="index-back_002dreferences"></span>
 <p>A back-reference such as &lsquo;<samp>\1</samp>&rsquo; can hurt performance 
significantly
 in some cases, since back-references cannot in general be implemented
@@ -111,6 +122,14 @@
 <samp>-a</samp> (<samp>--binary-files=text</samp>) option is used (see <a 
href="File-and-Directory-Selection.html">File and Directory Selection</a>), 
unless the <samp>-z</samp> (<samp>--null-data</samp>)
 option is also used (see <a href="Other-Options.html">Other Options</a>).
 </p>
+<span id="index-pipelines-and-reading"></span>
+<p>For efficiency <code>grep</code> does not always read all its input.
+For example, the shell command &lsquo;<samp>sed '/^...$/d' | grep -q 
X</samp>&rsquo; can
+cause <code>grep</code> to exit immediately after reading a line
+containing &lsquo;<samp>X</samp>&rsquo;, without bothering to read the rest of 
its input data.
+This in turn can cause <code>sed</code> to exit with a nonzero status because
+<code>sed</code> cannot write to its output pipe after <code>grep</code> exits.
+</p>
 <p>For more about the algorithms used by <code>grep</code> and about
 related string matching algorithms, see:
 </p>
@@ -123,20 +142,33 @@
 
 </li><li> Aho AV, Corasick MJ. Efficient string matching: an aid to 
bibliographic search.
 <em>CACM</em>. 1975;18(6):333&ndash;40.
-<a 
href="https://dx.doi.org/10.1145/360825.360855";>https://dx.doi.org/10.1145/360825.360855</a>.
+<a 
href="https://doi.org/10.1145/360825.360855";>https://doi.org/10.1145/360825.360855</a>.
 This introduces the Aho&ndash;Corasick algorithm.
 
 </li><li> Boyer RS, Moore JS. A fast string searching algorithm.
 <em>CACM</em>. 1977;20(10):762&ndash;72.
-<a 
href="https://dx.doi.org/10.1145/359842.359859";>https://dx.doi.org/10.1145/359842.359859</a>.
+<a 
href="https://doi.org/10.1145/359842.359859";>https://doi.org/10.1145/359842.359859</a>.
 This introduces the Boyer&ndash;Moore algorithm.
 
 </li><li> Faro S, Lecroq T. The exact online string matching problem: a review
 of the most recent results.
 <em>ACM Comput Surv</em>. 2013;45(2):13.
-<a 
href="https://dx.doi.org/10.1145/2431211.2431212";>https://dx.doi.org/10.1145/2431211.2431212</a>.
+<a 
href="https://doi.org/10.1145/2431211.2431212";>https://doi.org/10.1145/2431211.2431212</a>.
 This surveys string matching algorithms that might help improve the
 performance of <code>grep</code> in the future.
+
+</li><li> Hakak SI, Kamsin A, Shivakumara P, Gilkar GA, Khan WZ, Imran M.
+Exact string matching algorithms: survey issues, and future research 
directions.
+<em>IEEE Access</em>. 2019;7:69614&ndash;37.
+<a 
href="https://doi.org/10.1109/ACCESS.2019.2914071";>https://doi.org/10.1109/ACCESS.2019.2914071</a>.
+This survey is more recent than Faro &amp; Lecroq,
+and focuses on taxonomy instead of performance.
+
+</li><li> Hume A, Sunday D. Fast string search.
+<em>Software Pract Exper</em>. 1991;21(11):1221&ndash;48.
+<a 
href="https://doi.org/10.1002/spe.4380211105";>https://doi.org/10.1002/spe.4380211105</a>.
+This excellent albeit now-dated survey aided the initial development
+of <code>grep</code>.
 </li></ul>
 
 </div>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]