bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#25336:


From: Paul Eggert
Subject: bug#25336:
Date: Mon, 2 Jan 2017 10:30:32 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1

Zepp Lu wrote:

$ printf '\x53\xef' | grep -aoP '\x53\xef'
(no output, returns 1)
$ printf '\x53\xc3\xaf' | grep -aoP '\x53\xef'
Sï
$ printf '\x53\xc3\xef' | grep -aoP '\x53\xef'
(no output, returns 1)

I don't see a bug here. PCRE patterns like \xef match code points, not bytes, so the PCRE notation differs from the shell printf notation. If your locale uses UTF-8, the PCRE pattern \xef matches the Unicode character U+00EF LATIN SMALL LETTER I WITH DIAERESIS, which is represented by the byte pair C3 AF.

If you want \xef to match a single byte, run grep in a single-byte locale, e.g., set LC_ALL=C in the environment.

grep (version 2.12-2) provided by Debian works just fine.

Actually, it's buggy in this area. Sometimes it can dump core.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]