|
From: | Paul Eggert |
Subject: | bug#25336: |
Date: | Mon, 2 Jan 2017 10:30:32 -0800 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 |
Zepp Lu wrote:
$ printf '\x53\xef' | grep -aoP '\x53\xef' (no output, returns 1) $ printf '\x53\xc3\xaf' | grep -aoP '\x53\xef' Sï $ printf '\x53\xc3\xef' | grep -aoP '\x53\xef' (no output, returns 1)
I don't see a bug here. PCRE patterns like \xef match code points, not bytes, so the PCRE notation differs from the shell printf notation. If your locale uses UTF-8, the PCRE pattern \xef matches the Unicode character U+00EF LATIN SMALL LETTER I WITH DIAERESIS, which is represented by the byte pair C3 AF.
If you want \xef to match a single byte, run grep in a single-byte locale, e.g., set LC_ALL=C in the environment.
grep (version 2.12-2) provided by Debian works just fine.
Actually, it's buggy in this area. Sometimes it can dump core.
[Prev in Thread] | Current Thread | [Next in Thread] |