[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Changes to html_node/Problematic-Expressions.html
From: |
Jim Meyering |
Subject: |
Changes to html_node/Problematic-Expressions.html |
Date: |
Sat, 3 Sep 2022 15:33:16 -0400 (EDT) |
CVSROOT: /webcvs/grep
Module name: grep
Changes by: Jim Meyering <meyering> 22/09/03 15:33:15
Index: html_node/Problematic-Expressions.html
===================================================================
RCS file: html_node/Problematic-Expressions.html
diff -N html_node/Problematic-Expressions.html
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ html_node/Problematic-Expressions.html 3 Sep 2022 19:33:14 -0000
1.1
@@ -0,0 +1,197 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ -->
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+<!-- This manual is for grep, a pattern matching engine.
+
+Copyright (C) 1999-2002, 2005, 2008-2022 Free Software Foundation,
+Inc.
+
+Permission is granted to copy, distribute and/or modify this document
+under the terms of the GNU Free Documentation License, Version 1.3 or
+any later version published by the Free Software Foundation; with no
+Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
+Texts. A copy of the license is included in the section entitled
+"GNU Free Documentation License". -->
+<title>Problematic Expressions (GNU Grep 3.8)</title>
+
+<meta name="description" content="Problematic Expressions (GNU Grep 3.8)">
+<meta name="keywords" content="Problematic Expressions (GNU Grep 3.8)">
+<meta name="resource-type" content="document">
+<meta name="distribution" content="global">
+<meta name="Generator" content="makeinfo">
+<meta name="viewport" content="width=device-width,initial-scale=1">
+
+<link href="index.html" rel="start" title="Top">
+<link href="Index.html" rel="index" title="Index">
+<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
+<link href="Regular-Expressions.html" rel="up" title="Regular Expressions">
+<link href="Character-Encoding.html" rel="next" title="Character Encoding">
+<link href="Basic-vs-Extended.html" rel="prev" title="Basic vs Extended">
+<style type="text/css">
+<!--
+a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em}
+a.summary-letter {text-decoration: none}
+blockquote.indentedblock {margin-right: 0em}
+div.display {margin-left: 3.2em}
+div.example {margin-left: 3.2em}
+kbd {font-style: oblique}
+pre.display {font-family: inherit}
+pre.format {font-family: inherit}
+pre.menu-comment {font-family: serif}
+pre.menu-preformatted {font-family: serif}
+span.nolinebreak {white-space: nowrap}
+span.roman {font-family: initial; font-weight: normal}
+span.sansserif {font-family: sans-serif; font-weight: normal}
+span:hover a.copiable-anchor {visibility: visible}
+ul.no-bullet {list-style: none}
+-->
+</style>
+<link rel="stylesheet" type="text/css"
href="https://www.gnu.org/software/gnulib/manual.css">
+
+
+</head>
+
+<body lang="en">
+<div class="section" id="Problematic-Expressions">
+<div class="header">
+<p>
+Next: <a href="Character-Encoding.html" accesskey="n" rel="next">Character
Encoding</a>, Previous: <a href="Basic-vs-Extended.html" accesskey="p"
rel="prev">Basic vs Extended Regular Expressions</a>, Up: <a
href="Regular-Expressions.html" accesskey="u" rel="up">Regular Expressions</a>
[<a href="index.html#SEC_Contents" title="Table of contents"
rel="contents">Contents</a>][<a href="Index.html" title="Index"
rel="index">Index</a>]</p>
+</div>
+<hr>
+<span id="Problematic-Regular-Expressions"></span><h3 class="section">3.7
Problematic Regular Expressions</h3>
+
+<span id="index-invalid-regular-expressions"></span>
+<span id="index-unspecified-behavior-in-regular-expressions"></span>
+<p>Some strings are <em>invalid regular expressions</em> and cause
+<code>grep</code> to issue a diagnostic and fail. For example,
‘<samp>xy\1</samp>’
+is invalid because there is no parenthesized subexpression for the
+back-reference ‘<samp>\1</samp>’ to refer to.
+</p>
+<p>Also, some regular expressions have <em>unspecified behavior</em> and
+should be avoided even if <code>grep</code> does not currently diagnose
+them. For example, ‘<samp>xy\0</samp>’ has unspecified behavior
because
+‘<samp>0</samp>’ is not a special character and
‘<samp>\0</samp>’ is not a special
+backslash expression (see <a href="Special-Backslash-Expressions.html">Special
Backslash Expressions</a>).
+Unspecified behavior can be particularly problematic because the set
+of matched strings might be only partially specified, or not be
+specified at all, or the expression might even be invalid.
+</p>
+<p>The following regular expression constructs are invalid on all
+platforms conforming to POSIX, so portable scripts can assume that
+<code>grep</code> rejects these constructs:
+</p>
+<ul>
+<li> A basic regular expression containing a back-reference
‘<samp>\<var>n</var></samp>’
+preceded by fewer than <var>n</var> closing parentheses. For example,
+‘<samp>\(a\)\2</samp>’ is invalid.
+
+</li><li> A bracket expression containing ‘<samp>[:</samp>’ that
does not start a
+character class; and similarly for ‘<samp>[=</samp>’ and
‘<samp>[.</samp>’. For
+example, ‘<samp>[a[:b]</samp>’ and
‘<samp>[a[:ouch:]b]</samp>’ are invalid.
+</li></ul>
+
+<p>GNU <code>grep</code> treats the following constructs as invalid.
+However, other <code>grep</code> implementations might allow them, so
+portable scripts should not rely on their being invalid:
+</p>
+<ul>
+<li> Unescaped ‘<samp>\</samp>’ at the end of a regular expression.
+
+</li><li> Unescaped ‘<samp>[</samp>’ that does not start a bracket
expression.
+
+</li><li> A ‘<samp>\{</samp>’ in a basic regular expression that
does not start an
+interval expression.
+
+</li><li> A basic regular expression with unbalanced
‘<samp>\(</samp>’ or ‘<samp>\)</samp>’,
+or an extended regular expression with unbalanced ‘<samp>(</samp>’.
+
+</li><li> In the POSIX locale, a range expression like
‘<samp>z-a</samp>’ that
+represents zero elements. A non-GNU <code>grep</code> might treat it as
+a valid range that never matches.
+
+</li><li> An interval expression with a repetition count greater than 32767.
+(The portable POSIX limit is 255, and even interval expressions with
+smaller counts can be impractically slow on all known implementations.)
+
+</li><li> A bracket expression that contains at least three elements, the first
+and last of which are both ‘<samp>:</samp>’, or both
‘<samp>.</samp>’, or both
+‘<samp>=</samp>’. For example, a non-GNU <code>grep</code> might
treat
+‘<samp>[:alpha:]</samp>’ like
‘<samp>[[:alpha:]]</samp>’, or like
‘<samp>[:ahlp]</samp>’.
+</li></ul>
+
+<p>The following constructs have well-defined behavior in GNU
+<code>grep</code>. However, they have unspecified behavior elsewhere, so
+portable scripts should avoid them:
+</p>
+<ul>
+<li> Special backslash expressions like ‘<samp>\b</samp>’,
‘<samp>\<</samp>’, and ‘<samp>\]</samp>’.
+See <a href="Special-Backslash-Expressions.html">Special Backslash
Expressions</a>.
+
+</li><li> A basic regular expression that uses ‘<samp>\?</samp>’,
‘<samp>\+</samp>’, or ‘<samp>\|</samp>’.
+
+</li><li> An extended regular expression that uses back-references.
+
+</li><li> An empty regular expression, subexpression, or alternative. For
+example, ‘<samp>(a|bc|)</samp>’ is not portable; a portable
equivalent is
+‘<samp>(a|bc)?</samp>’.
+
+</li><li> In a basic regular expression, an anchoring
‘<samp>^</samp>’ that appears
+directly after ‘<samp>\(</samp>’, or an anchoring
‘<samp>$</samp>’ that appears
+directly before ‘<samp>\)</samp>’.
+
+</li><li> In a basic regular expression, a repetition operator that
+directly follows another repetition operator.
+
+</li><li> In an extended regular expression, unescaped
‘<samp>{</samp>’
+that does not begin a valid interval expression.
+GNU <code>grep</code> treats the ‘<samp>{</samp>’ as an ordinary
character.
+
+</li><li> A null character or an encoding error in either pattern or input
data.
+See <a href="Character-Encoding.html">Character Encoding</a>.
+
+</li><li> An input file that ends in a non-newline character,
+where GNU <code>grep</code> silently supplies a newline.
+</li></ul>
+
+<p>The following constructs have unspecified behavior, in both GNU
+and other <code>grep</code> implementations. Scripts should avoid
+them whenever possible.
+</p>
+<ul>
+<li> A backslash escaping an ordinary character, unless it is a
+back-reference like ‘<samp>\1</samp>’ or a special backslash
expression like
+‘<samp>\<</samp>’ or ‘<samp>\b</samp>’. See <a
href="Special-Backslash-Expressions.html">Special Backslash Expressions</a>.
For
+example, ‘<samp>\x</samp>’ has unspecified behavior now, and a
future version
+of <code>grep</code> might specify ‘<samp>\x</samp>’ to have a new
behavior.
+
+</li><li> A repetition operator that appears directly after an anchor, or at
the
+start of a complete regular expression, parenthesized subexpression,
+or alternative. For example, ‘<samp>+|^*(+a|?-b)</samp>’ has
unspecified
+behavior, whereas ‘<samp>\+|^\*(\+a|\?-b)</samp>’ is portable.
+
+</li><li> A range expression outside the POSIX locale. For example, in some
+locales ‘<samp>[a-z]</samp>’ might match some characters that are
not
+lowercase letters, or might not match some lowercase letters, or might
+be invalid. With GNU <code>grep</code> it is not documented whether
+these range expressions use native code points, or use the collating
+sequence specified by the <code>LC_COLLATE</code> category, or have some
+other interpretation. Outside the POSIX locale, it is portable to use
+‘<samp>[[:lower:]]</samp>’ to match a lower-case letter, or
+‘<samp>[abcdefghijklmnopqrstuvwxyz]</samp>’ to match an ASCII
lower-case
+letter.
+
+</li></ul>
+
+</div>
+<hr>
+<div class="header">
+<p>
+Next: <a href="Character-Encoding.html">Character Encoding</a>, Previous: <a
href="Basic-vs-Extended.html">Basic vs Extended Regular Expressions</a>, Up: <a
href="Regular-Expressions.html">Regular Expressions</a> [<a
href="index.html#SEC_Contents" title="Table of contents"
rel="contents">Contents</a>][<a href="Index.html" title="Index"
rel="index">Index</a>]</p>
+</div>
+
+
+
+</body>
+</html>
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Changes to html_node/Problematic-Expressions.html,
Jim Meyering <=