grep-commit
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Changes to html_node/Problematic-Expressions.html


From: Jim Meyering
Subject: Changes to html_node/Problematic-Expressions.html
Date: Sat, 3 Sep 2022 15:33:16 -0400 (EDT)

CVSROOT:        /webcvs/grep
Module name:    grep
Changes by:     Jim Meyering <meyering> 22/09/03 15:33:15

Index: html_node/Problematic-Expressions.html
===================================================================
RCS file: html_node/Problematic-Expressions.html
diff -N html_node/Problematic-Expressions.html
--- /dev/null   1 Jan 1970 00:00:00 -0000
+++ html_node/Problematic-Expressions.html      3 Sep 2022 19:33:14 -0000       
1.1
@@ -0,0 +1,197 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
"http://www.w3.org/TR/html4/loose.dtd";>
+<html>
+<!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ -->
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+<!-- This manual is for grep, a pattern matching engine.
+
+Copyright (C) 1999-2002, 2005, 2008-2022 Free Software Foundation,
+Inc.
+
+Permission is granted to copy, distribute and/or modify this document
+under the terms of the GNU Free Documentation License, Version 1.3 or
+any later version published by the Free Software Foundation; with no
+Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
+Texts.  A copy of the license is included in the section entitled
+"GNU Free Documentation License". -->
+<title>Problematic Expressions (GNU Grep 3.8)</title>
+
+<meta name="description" content="Problematic Expressions (GNU Grep 3.8)">
+<meta name="keywords" content="Problematic Expressions (GNU Grep 3.8)">
+<meta name="resource-type" content="document">
+<meta name="distribution" content="global">
+<meta name="Generator" content="makeinfo">
+<meta name="viewport" content="width=device-width,initial-scale=1">
+
+<link href="index.html" rel="start" title="Top">
+<link href="Index.html" rel="index" title="Index">
+<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
+<link href="Regular-Expressions.html" rel="up" title="Regular Expressions">
+<link href="Character-Encoding.html" rel="next" title="Character Encoding">
+<link href="Basic-vs-Extended.html" rel="prev" title="Basic vs Extended">
+<style type="text/css">
+<!--
+a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em}
+a.summary-letter {text-decoration: none}
+blockquote.indentedblock {margin-right: 0em}
+div.display {margin-left: 3.2em}
+div.example {margin-left: 3.2em}
+kbd {font-style: oblique}
+pre.display {font-family: inherit}
+pre.format {font-family: inherit}
+pre.menu-comment {font-family: serif}
+pre.menu-preformatted {font-family: serif}
+span.nolinebreak {white-space: nowrap}
+span.roman {font-family: initial; font-weight: normal}
+span.sansserif {font-family: sans-serif; font-weight: normal}
+span:hover a.copiable-anchor {visibility: visible}
+ul.no-bullet {list-style: none}
+-->
+</style>
+<link rel="stylesheet" type="text/css" 
href="https://www.gnu.org/software/gnulib/manual.css";>
+
+
+</head>
+
+<body lang="en">
+<div class="section" id="Problematic-Expressions">
+<div class="header">
+<p>
+Next: <a href="Character-Encoding.html" accesskey="n" rel="next">Character 
Encoding</a>, Previous: <a href="Basic-vs-Extended.html" accesskey="p" 
rel="prev">Basic vs Extended Regular Expressions</a>, Up: <a 
href="Regular-Expressions.html" accesskey="u" rel="up">Regular Expressions</a> 
&nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" 
rel="contents">Contents</a>][<a href="Index.html" title="Index" 
rel="index">Index</a>]</p>
+</div>
+<hr>
+<span id="Problematic-Regular-Expressions"></span><h3 class="section">3.7 
Problematic Regular Expressions</h3>
+
+<span id="index-invalid-regular-expressions"></span>
+<span id="index-unspecified-behavior-in-regular-expressions"></span>
+<p>Some strings are <em>invalid regular expressions</em> and cause
+<code>grep</code> to issue a diagnostic and fail.  For example, 
&lsquo;<samp>xy\1</samp>&rsquo;
+is invalid because there is no parenthesized subexpression for the
+back-reference &lsquo;<samp>\1</samp>&rsquo; to refer to.
+</p>
+<p>Also, some regular expressions have <em>unspecified behavior</em> and
+should be avoided even if <code>grep</code> does not currently diagnose
+them.  For example, &lsquo;<samp>xy\0</samp>&rsquo; has unspecified behavior 
because
+&lsquo;<samp>0</samp>&rsquo; is not a special character and 
&lsquo;<samp>\0</samp>&rsquo; is not a special
+backslash expression (see <a href="Special-Backslash-Expressions.html">Special 
Backslash Expressions</a>).
+Unspecified behavior can be particularly problematic because the set
+of matched strings might be only partially specified, or not be
+specified at all, or the expression might even be invalid.
+</p>
+<p>The following regular expression constructs are invalid on all
+platforms conforming to POSIX, so portable scripts can assume that
+<code>grep</code> rejects these constructs:
+</p>
+<ul>
+<li> A basic regular expression containing a back-reference 
&lsquo;<samp>\<var>n</var></samp>&rsquo;
+preceded by fewer than <var>n</var> closing parentheses.  For example,
+&lsquo;<samp>\(a\)\2</samp>&rsquo; is invalid.
+
+</li><li> A bracket expression containing &lsquo;<samp>[:</samp>&rsquo; that 
does not start a
+character class; and similarly for &lsquo;<samp>[=</samp>&rsquo; and 
&lsquo;<samp>[.</samp>&rsquo;.  For
+example, &lsquo;<samp>[a[:b]</samp>&rsquo; and 
&lsquo;<samp>[a[:ouch:]b]</samp>&rsquo; are invalid.
+</li></ul>
+
+<p>GNU <code>grep</code> treats the following constructs as invalid.
+However, other <code>grep</code> implementations might allow them, so
+portable scripts should not rely on their being invalid:
+</p>
+<ul>
+<li> Unescaped &lsquo;<samp>\</samp>&rsquo; at the end of a regular expression.
+
+</li><li> Unescaped &lsquo;<samp>[</samp>&rsquo; that does not start a bracket 
expression.
+
+</li><li> A &lsquo;<samp>\{</samp>&rsquo; in a basic regular expression that 
does not start an
+interval expression.
+
+</li><li> A basic regular expression with unbalanced 
&lsquo;<samp>\(</samp>&rsquo; or &lsquo;<samp>\)</samp>&rsquo;,
+or an extended regular expression with unbalanced &lsquo;<samp>(</samp>&rsquo;.
+
+</li><li> In the POSIX locale, a range expression like 
&lsquo;<samp>z-a</samp>&rsquo; that
+represents zero elements.  A non-GNU <code>grep</code> might treat it as
+a valid range that never matches.
+
+</li><li> An interval expression with a repetition count greater than 32767.
+(The portable POSIX limit is 255, and even interval expressions with
+smaller counts can be impractically slow on all known implementations.)
+
+</li><li> A bracket expression that contains at least three elements, the first
+and last of which are both &lsquo;<samp>:</samp>&rsquo;, or both 
&lsquo;<samp>.</samp>&rsquo;, or both
+&lsquo;<samp>=</samp>&rsquo;.  For example, a non-GNU <code>grep</code> might 
treat
+&lsquo;<samp>[:alpha:]</samp>&rsquo; like 
&lsquo;<samp>[[:alpha:]]</samp>&rsquo;, or like 
&lsquo;<samp>[:ahlp]</samp>&rsquo;.
+</li></ul>
+
+<p>The following constructs have well-defined behavior in GNU
+<code>grep</code>.  However, they have unspecified behavior elsewhere, so
+portable scripts should avoid them:
+</p>
+<ul>
+<li> Special backslash expressions like &lsquo;<samp>\b</samp>&rsquo;, 
&lsquo;<samp>\&lt;</samp>&rsquo;, and &lsquo;<samp>\]</samp>&rsquo;.
+See <a href="Special-Backslash-Expressions.html">Special Backslash 
Expressions</a>.
+
+</li><li> A basic regular expression that uses &lsquo;<samp>\?</samp>&rsquo;, 
&lsquo;<samp>\+</samp>&rsquo;, or &lsquo;<samp>\|</samp>&rsquo;.
+
+</li><li> An extended regular expression that uses back-references.
+
+</li><li> An empty regular expression, subexpression, or alternative.  For
+example, &lsquo;<samp>(a|bc|)</samp>&rsquo; is not portable; a portable 
equivalent is
+&lsquo;<samp>(a|bc)?</samp>&rsquo;.
+
+</li><li> In a basic regular expression, an anchoring 
&lsquo;<samp>^</samp>&rsquo; that appears
+directly after &lsquo;<samp>\(</samp>&rsquo;, or an anchoring 
&lsquo;<samp>$</samp>&rsquo; that appears
+directly before &lsquo;<samp>\)</samp>&rsquo;.
+
+</li><li> In a basic regular expression, a repetition operator that
+directly follows another repetition operator.
+
+</li><li> In an extended regular expression, unescaped 
&lsquo;<samp>{</samp>&rsquo;
+that does not begin a valid interval expression.
+GNU <code>grep</code> treats the &lsquo;<samp>{</samp>&rsquo; as an ordinary 
character.
+
+</li><li> A null character or an encoding error in either pattern or input 
data.
+See <a href="Character-Encoding.html">Character Encoding</a>.
+
+</li><li> An input file that ends in a non-newline character,
+where GNU <code>grep</code> silently supplies a newline.
+</li></ul>
+
+<p>The following constructs have unspecified behavior, in both GNU
+and other <code>grep</code> implementations.  Scripts should avoid
+them whenever possible.
+</p>
+<ul>
+<li> A backslash escaping an ordinary character, unless it is a
+back-reference like &lsquo;<samp>\1</samp>&rsquo; or a special backslash 
expression like
+&lsquo;<samp>\&lt;</samp>&rsquo; or &lsquo;<samp>\b</samp>&rsquo;.  See <a 
href="Special-Backslash-Expressions.html">Special Backslash Expressions</a>.  
For
+example, &lsquo;<samp>\x</samp>&rsquo; has unspecified behavior now, and a 
future version
+of <code>grep</code> might specify &lsquo;<samp>\x</samp>&rsquo; to have a new 
behavior.
+
+</li><li> A repetition operator that appears directly after an anchor, or at 
the
+start of a complete regular expression, parenthesized subexpression,
+or alternative.  For example, &lsquo;<samp>+|^*(+a|?-b)</samp>&rsquo; has 
unspecified
+behavior, whereas &lsquo;<samp>\+|^\*(\+a|\?-b)</samp>&rsquo; is portable.
+
+</li><li> A range expression outside the POSIX locale.  For example, in some
+locales &lsquo;<samp>[a-z]</samp>&rsquo; might match some characters that are 
not
+lowercase letters, or might not match some lowercase letters, or might
+be invalid.  With GNU <code>grep</code> it is not documented whether
+these range expressions use native code points, or use the collating
+sequence specified by the <code>LC_COLLATE</code> category, or have some
+other interpretation.  Outside the POSIX locale, it is portable to use
+&lsquo;<samp>[[:lower:]]</samp>&rsquo; to match a lower-case letter, or
+&lsquo;<samp>[abcdefghijklmnopqrstuvwxyz]</samp>&rsquo; to match an ASCII 
lower-case
+letter.
+
+</li></ul>
+
+</div>
+<hr>
+<div class="header">
+<p>
+Next: <a href="Character-Encoding.html">Character Encoding</a>, Previous: <a 
href="Basic-vs-Extended.html">Basic vs Extended Regular Expressions</a>, Up: <a 
href="Regular-Expressions.html">Regular Expressions</a> &nbsp; [<a 
href="index.html#SEC_Contents" title="Table of contents" 
rel="contents">Contents</a>][<a href="Index.html" title="Index" 
rel="index">Index</a>]</p>
+</div>
+
+
+
+</body>
+</html>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]