|
From: | Paul Eggert |
Subject: | bug#18266: handling bytes not part of the charset, and other garbage |
Date: | Thu, 11 Sep 2014 20:26:12 -0700 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1 |
Vincent Lefevre wrote:
ypig% LC_ALL=C locale charmap ANSI_X3.4-1968
That may be what the 'locale' command says, but bytes with the top bit on are considered to be valid single-byte characters. There are no encoding errors. So, in that sense it's not strict ASCII.
the current behavior breaks the sometimes used "grep ." solution to match non-empty lines.
"grep ." matches lines containing one or more characters. Encoding errors are not characters, at least, not as far as plain grep is concerned.
Perhaps PCRE is different, and if libpcre worked with encoding errors we could simply use its way of matching them. But there doesn't seem to be a safe way to do that.
[Prev in Thread] | Current Thread | [Next in Thread] |