[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
grep branch, master, updated. v3.7-17-gf0d97db
From: |
Paul Eggert |
Subject: |
grep branch, master, updated. v3.7-17-gf0d97db |
Date: |
Fri, 27 Aug 2021 21:21:27 -0400 (EDT) |
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "grep".
The branch, master has been updated
via f0d97db2a2104c5fd558178713054f3f267623b2 (commit)
via fd72f5d2c2a9a6a220e98af1c0230f1ae6e0a8d2 (commit)
from e3694e90b4789ccafaf022a29d9ce08ff11375c2 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
http://git.savannah.gnu.org/cgit/grep.git/commit/?id=f0d97db2a2104c5fd558178713054f3f267623b2
commit f0d97db2a2104c5fd558178713054f3f267623b2
Author: Paul Eggert <eggert@cs.ucla.edu>
Date: Fri Aug 27 18:20:58 2021 -0700
doc: document interval expression limitations
* doc/grep.texi (Basic vs Extended, Performance):
Document limitations of interval expressions (Bug#44538).
diff --git a/doc/grep.texi b/doc/grep.texi
index b92ecb7..e5b9fd8 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -1526,7 +1526,7 @@ before an interval expression's closing @samp{@}}, and an
unmatched
@code{\)} is invalid.
Portable scripts should avoid the following constructs, as
-POSIX says they produce undefined results:
+POSIX says they produce unspecified results:
@itemize @bullet
@item
@@ -1541,6 +1541,8 @@ Empty alternatives (as in, e.g, @samp{a|}).
Repetition operators that immediately follow empty expressions,
unescaped @samp{$}, or other repetition operators.
@item
+Interval expressions containing repetition counts greater than 255.
+@item
A backslash escaping an ordinary character (e.g., @samp{\S}),
unless it is a back-reference.
@item
@@ -1965,6 +1967,17 @@ bracket expressions like @samp{[a-z]} and
@samp{[[=a=]b]}, can be
surprisingly inefficient due to difficulties in fast portable access to
concepts like multi-character collating elements.
+@cindex interval expressions
+Interval expressions may be implemented internally via repetition.
+For example, @samp{^(a|bc)@{2,4@}$} might be implemented as
+@samp{^(a|bc)(a|bc)((a|bc)(a|bc)?)?$}. A large repetition count may
+exhaust memory or greatly slow matching. Even small counts can cause
+problems if cascaded; for example, @samp{grep -E
+".*@{10,@}@{10,@}@{10,@}@{10,@}@{10,@}"} is likely to overflow a
+stack. Fortunately, regular expressions like these are typically
+artificial, and cascaded repetitions do not conform to POSIX so cannot
+be used in portable programs anyway.
+
@cindex back-references
A back-reference such as @samp{\1} can hurt performance significantly
in some cases, since back-references cannot in general be implemented
http://git.savannah.gnu.org/cgit/grep.git/commit/?id=fd72f5d2c2a9a6a220e98af1c0230f1ae6e0a8d2
commit f0d97db2a2104c5fd558178713054f3f267623b2
Author: Paul Eggert <eggert@cs.ucla.edu>
Date: Fri Aug 27 18:20:58 2021 -0700
doc: document interval expression limitations
* doc/grep.texi (Basic vs Extended, Performance):
Document limitations of interval expressions (Bug#44538).
diff --git a/doc/grep.texi b/doc/grep.texi
index b92ecb7..e5b9fd8 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -1526,7 +1526,7 @@ before an interval expression's closing @samp{@}}, and an
unmatched
@code{\)} is invalid.
Portable scripts should avoid the following constructs, as
-POSIX says they produce undefined results:
+POSIX says they produce unspecified results:
@itemize @bullet
@item
@@ -1541,6 +1541,8 @@ Empty alternatives (as in, e.g, @samp{a|}).
Repetition operators that immediately follow empty expressions,
unescaped @samp{$}, or other repetition operators.
@item
+Interval expressions containing repetition counts greater than 255.
+@item
A backslash escaping an ordinary character (e.g., @samp{\S}),
unless it is a back-reference.
@item
@@ -1965,6 +1967,17 @@ bracket expressions like @samp{[a-z]} and
@samp{[[=a=]b]}, can be
surprisingly inefficient due to difficulties in fast portable access to
concepts like multi-character collating elements.
+@cindex interval expressions
+Interval expressions may be implemented internally via repetition.
+For example, @samp{^(a|bc)@{2,4@}$} might be implemented as
+@samp{^(a|bc)(a|bc)((a|bc)(a|bc)?)?$}. A large repetition count may
+exhaust memory or greatly slow matching. Even small counts can cause
+problems if cascaded; for example, @samp{grep -E
+".*@{10,@}@{10,@}@{10,@}@{10,@}@{10,@}"} is likely to overflow a
+stack. Fortunately, regular expressions like these are typically
+artificial, and cascaded repetitions do not conform to POSIX so cannot
+be used in portable programs anyway.
+
@cindex back-references
A back-reference such as @samp{\1} can hurt performance significantly
in some cases, since back-references cannot in general be implemented
-----------------------------------------------------------------------
Summary of changes:
doc/grep.texi | 15 ++++++++++++++-
gnulib | 2 +-
src/system.h | 4 ++--
3 files changed, 17 insertions(+), 4 deletions(-)
hooks/post-receive
--
grep
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- grep branch, master, updated. v3.7-17-gf0d97db,
Paul Eggert <=