bug#28255: grep erroneously skips Microsoft UTF-8 text files as being bi

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#28255: grep erroneously skips Microsoft UTF-8 text files as being bi

From:	Paul Eggert
Subject:	bug#28255: grep erroneously skips Microsoft UTF-8 text files as being binary
Date:	Sun, 27 Aug 2017 14:47:28 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1

Simon wrote:

Windows text files can start with a byte order mark of U+FEFF and then
be encoded in UTF-8.  These are skipped as being binary files.


I can't reproduce this problem on Fedora 26 x86-64. Here's how I tried:

$ printf '\357\273\277x\n' >t
$ LC_ALL=C grep x t | od -c
0000000 357 273 277   x  \n
0000005

To help us diagnose the problem, please send a simple, self-contained example,and mention your platform.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#28255: grep erroneously skips Microsoft UTF-8 text files as being binary, Simon, 2017/08/27
- bug#28255: grep erroneously skips Microsoft UTF-8 text files as being binary, Paul Eggert <=
  - Message not available
    - bug#28255: grep erroneously skips Microsoft UTF-8 text files as being binary, Paul Eggert, 2017/08/27

Prev by Date: bug#28255: grep erroneously skips Microsoft UTF-8 text files as being binary
Next by Date: bug#28255: grep erroneously skips Microsoft UTF-8 text files as being binary
Previous by thread: bug#28255: grep erroneously skips Microsoft UTF-8 text files as being binary
Next by thread: bug#28255: grep erroneously skips Microsoft UTF-8 text files as being binary
Index(es):
- Date
- Thread