bug#18266: handling bytes not part of the charset, and other garbage

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#18266: handling bytes not part of the charset, and other garbage

From:	Paul Eggert
Subject:	bug#18266: handling bytes not part of the charset, and other garbage
Date:	Thu, 11 Sep 2014 20:26:12 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1

Vincent Lefevre wrote:

ypig% LC_ALL=C locale charmap
ANSI_X3.4-1968

That may be what the 'locale' command says, but bytes with the top biton are considered to be valid single-byte characters. There are noencoding errors. So, in that sense it's not strict ASCII.

the current behavior breaks the sometimes used "grep ." solution
to match non-empty lines.

"grep ." matches lines containing one or more characters. Encodingerrors are not characters, at least, not as far as plain grep is concerned.

Perhaps PCRE is different, and if libpcre worked with encoding errors wecould simply use its way of matching them. But there doesn't seem to bea safe way to do that.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error, (continued)

Prev by Date: bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales
Next by Date: bug#18455: grep 2.20 perl-regexp: invalid UTF-8 byte sequence in input
Previous by thread: bug#18266: handling bytes not part of the charset, and other garbage
Next by thread: bug#18266: handling bytes not part of the charset, and other garbage
Index(es):
- Date
- Thread