bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#30326: grep not searching through a text file (thinking it binary)


From: Paul Eggert
Subject: bug#30326: grep not searching through a text file (thinking it binary)
Date: Fri, 2 Feb 2018 15:44:45 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2

On 02/02/2018 03:30 PM, L A Walsh wrote:
most computer files (vs. user-files) are still single-byte.

That's because so many of them are ASCII. But ASCII files are not the issue here. grep's behavior hasn't changed when operating on ASCII files in typical locales. The issue is text using a non-ASCII encoding that is not compatible with your locale; e.g., if your text file uses ISO 8859-1 but your locale specifies UTF-8.

In my experience, UTF-8 has long been winning this battle, in the sense that UTF-8 is by far the dominant encoding for the non-ASCII files I regularly use. So I use a UTF-8 locale, and suggest this as a good default for most users nowadays.

It's not possible to get direct statistics about encoding for all user files. However, we can see what's being published on the web. Currently UTF-8 is being used by about 90% of public websites whose character encoding can be determined, according to the latest W3Techs survey. ISO 8859-1 is in second place, at about 4%. See:

https://w3techs.com/technologies/overview/character_encoding/all






reply via email to

[Prev in Thread] Current Thread [Next in Thread]