[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Changes to html_node/Matching-Non_002dASCII.html
From: |
Jim Meyering |
Subject: |
Changes to html_node/Matching-Non_002dASCII.html |
Date: |
Sun, 27 Sep 2020 23:36:54 -0400 (EDT) |
CVSROOT: /webcvs/grep
Module name: grep
Changes by: Jim Meyering <meyering> 20/09/27 23:36:49
Index: html_node/Matching-Non_002dASCII.html
===================================================================
RCS file: html_node/Matching-Non_002dASCII.html
diff -N html_node/Matching-Non_002dASCII.html
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ html_node/Matching-Non_002dASCII.html 28 Sep 2020 03:36:49 -0000
1.1
@@ -0,0 +1,116 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<!-- This manual is for grep, a pattern matching engine.
+
+Copyright (C) 1999-2002, 2005, 2008-2020 Free Software Foundation,
+Inc.
+
+Permission is granted to copy, distribute and/or modify this document
+under the terms of the GNU Free Documentation License, Version 1.3 or
+any later version published by the Free Software Foundation; with no
+Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
+Texts. A copy of the license is included in the section entitled
+"GNU Free Documentation License". -->
+<!-- Created by GNU Texinfo 6.5, http://www.gnu.org/software/texinfo/ -->
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+<title>Matching Non-ASCII (GNU Grep 3.5)</title>
+
+<meta name="description" content="Matching Non-ASCII (GNU Grep 3.5)">
+<meta name="keywords" content="Matching Non-ASCII (GNU Grep 3.5)">
+<meta name="resource-type" content="document">
+<meta name="distribution" content="global">
+<meta name="Generator" content="makeinfo">
+<link href="index.html#Top" rel="start" title="Top">
+<link href="Index.html#Index" rel="index" title="Index">
+<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
+<link href="Regular-Expressions.html#Regular-Expressions" rel="up"
title="Regular Expressions">
+<link href="Usage.html#Usage" rel="next" title="Usage">
+<link href="Character-Encoding.html#Character-Encoding" rel="prev"
title="Character Encoding">
+<style type="text/css">
+<!--
+a.summary-letter {text-decoration: none}
+blockquote.indentedblock {margin-right: 0em}
+blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
+blockquote.smallquotation {font-size: smaller}
+div.display {margin-left: 3.2em}
+div.example {margin-left: 3.2em}
+div.lisp {margin-left: 3.2em}
+div.smalldisplay {margin-left: 3.2em}
+div.smallexample {margin-left: 3.2em}
+div.smalllisp {margin-left: 3.2em}
+kbd {font-style: oblique}
+pre.display {font-family: inherit}
+pre.format {font-family: inherit}
+pre.menu-comment {font-family: serif}
+pre.menu-preformatted {font-family: serif}
+pre.smalldisplay {font-family: inherit; font-size: smaller}
+pre.smallexample {font-size: smaller}
+pre.smallformat {font-family: inherit; font-size: smaller}
+pre.smalllisp {font-size: smaller}
+span.nolinebreak {white-space: nowrap}
+span.roman {font-family: initial; font-weight: normal}
+span.sansserif {font-family: sans-serif; font-weight: normal}
+ul.no-bullet {list-style: none}
+-->
+</style>
+<link rel="stylesheet" type="text/css" href="/software/gnulib/manual.css">
+
+
+</head>
+
+<body lang="en">
+<a name="Matching-Non_002dASCII"></a>
+<div class="header">
+<p>
+Previous: <a href="Character-Encoding.html#Character-Encoding" accesskey="p"
rel="prev">Character Encoding</a>, Up: <a
href="Regular-Expressions.html#Regular-Expressions" accesskey="u"
rel="up">Regular Expressions</a> [<a href="index.html#SEC_Contents"
title="Table of contents" rel="contents">Contents</a>][<a
href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
+</div>
+<hr>
+<a name="Matching-Non_002dASCII-and-Non_002dprintable-Characters"></a>
+<h3 class="section">3.8 Matching Non-ASCII and Non-printable Characters</h3>
+<a name="index-non_002dASCII-matching"></a>
+<a name="index-non_002dprintable-matching"></a>
+
+<p>In a regular expression, non-ASCII and non-printable characters other
+than newline are not special, and represent themselves. For example,
+in a locale using UTF-8 the command ‘<samp>grep
'Î Ï'</samp>’ (where the
+white space between ‘<samp>Î</samp>’ and the
‘<samp>Ï</samp>’ is a tab character)
+searches for ‘<samp>Î</samp>’ (Unicode character U+039B GREEK
CAPITAL LETTER
+LAMBDA), followed by a tab (U+0009 TAB), followed by
‘<samp>Ï</samp>’ (U+03C9
+GREEK SMALL LETTER OMEGA).
+</p>
+<p>Suppose you want to limit your pattern to only printable characters
+(or even only printable ASCII characters) to keep your script readable
+or portable, but you also want to match specific non-ASCII or non-null
+non-printable characters. If you are using the <samp>-P</samp>
+(<samp>--perl-regexp</samp>) option, PCREs give you several ways to do
+this. Otherwise, if you are using Bash, the GNU project’s shell, you
+can represent these characters via ANSI-C quoting. For example, the
+Bash commands ‘<samp>grep $'Î\tÏ'</samp>’ and ‘<samp>grep
$'\u039B\t\u03C9'</samp>’
+both search for the same three-character string
‘<samp>Î Ï</samp>’
+mentioned earlier. However, because Bash translates ANSI-C quoting
+before <code>grep</code> sees the pattern, this technique should not be
+used to match printable ASCII characters; for example, ‘<samp>grep
+$'\u005E'</samp>’ is equivalent to ‘<samp>grep '^'</samp>’
and matches any line, not
+just lines containing the character ‘<samp>^</samp>’ (U+005E
CIRCUMFLEX
+ACCENT).
+</p>
+<p>Since PCREs and ANSI-C quoting are GNU extensions to POSIX, portable
+shell scripts written in ASCII should use other methods to match
+specific non-ASCII characters. For example, in a UTF-8 locale the
+command ‘<samp>grep "$(printf
'\316\233\t\317\211\n')"</samp>’ is a portable
+albeit hard-to-read alternative to Bash’s ‘<samp>grep
$'Î\tÏ'</samp>’.
+However, none of these techniques will let you put a null character
+directly into a command-line pattern; null characters can appear only
+in a pattern specified via the <samp>-f</samp> (<samp>--file</samp>) option.
+</p>
+<hr>
+<div class="header">
+<p>
+Previous: <a href="Character-Encoding.html#Character-Encoding" accesskey="p"
rel="prev">Character Encoding</a>, Up: <a
href="Regular-Expressions.html#Regular-Expressions" accesskey="u"
rel="up">Regular Expressions</a> [<a href="index.html#SEC_Contents"
title="Table of contents" rel="contents">Contents</a>][<a
href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
+</div>
+
+
+
+</body>
+</html>
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Changes to html_node/Matching-Non_002dASCII.html,
Jim Meyering <=