--- Begin Message ---
Subject: |
grep 2.21-1 identifies iso encoded text files as binary |
Date: |
Mon, 15 Dec 2014 15:22:00 +0100 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
Hi,
I noticed that grep 2.21-1 regards ISO-8859-15 encoded files as binary, if
LC_ALL is set to en_US.UTF.
I am not sure if this is a bug or an expected behaviour change in 2.21-1, but
since I could not find anything in the changelog that directly mentions it, I am
reporting it. (I could not find anything on http://debbugs.gnu.org)
How to reproduce:
Create a iso-8859-15 encoded test file with: test ä ö ü
export LC_ALL=en_US.UTF8
grep test testfile
Binary file test matches
export LC_ALL=en_US
(grep works as expected)
The behaviour for LC_ALL=en_US.UTF8 was changed in 2.21-1 and worked correctly
in 2.20-1.
I am testing this on arch with glibc 2.20-4 (if that is relevant).
Please let me know if you need more informations.
Regards,
Martin
--
Martin Hoch Friedrich-Bergius-Ring 15
fidion GmbH 97076 Würzburg
--- End Message ---
--- Begin Message ---
Subject: |
Re: bug#19388: grep 2.21-1 identifies iso encoded text files as binary |
Date: |
Mon, 15 Dec 2014 23:12:10 -0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 |
Martin Hoch wrote:
I noticed that grep 2.21-1 regards ISO-8859-15 encoded files as binary, if
LC_ALL is set to en_US.UTF.
I am not sure if this is a bug or an expected behaviour change in 2.21-1
It's an expected change. Although this was documented in NEWS:
If a file contains data improperly encoded for the current locale,
and this is discovered before any of the file's contents are output,
grep now treats the file as binary.
the grep manual is not so clear about it. I installed the attached patch to try
to fix that.
0001-doc-document-binary-data-heuristic-better.patch
Description: Text Data
--- End Message ---