bug#22838: New 'Binary file' detection considered harmful

From: Eric Blake
Subject: bug#22838: New 'Binary file' detection considered harmful
Date: Mon, 29 Feb 2016 10:54:52 -0700
On 02/29/2016 10:40 AM, Marcello Perathoner wrote:
>> Wrong, at least according to the POSIX definition of text file.  A text
>> file is one with no encoding errors.
> """
> 3.397 Text File
> A file that contains characters organized into zero or more lines. The
> lines do not contain NUL characters and none can exceed {LINE_MAX} bytes
> in length, including the <newline> character. Although POSIX.1-2008 does
> not distinguish between text files and binary files (see the ISO C
> standard), many utilities only produce predictable or meaningful output
> when operating on text files. The standard utilities that have such
> restrictions always specify "text files" in their STDIN or INPUT FILES
> sections.


> 3.206 Line
> A sequence of zero or more non- <newline> characters plus a terminating 
> <newline> character.
> 3.87 Character
> A sequence of one or more bytes representing a single graphic symbol or 
> control code.
> Note:
> This term corresponds to the ISO C standard term multi-byte character, where 
> a single-byte character is a special case of a multi-byte character. Unlike 
> the usage in the ISO C standard, character here has no necessary relationship 
> with storage space, and byte is used when storage space is discussed.
> See the definition of the portable character set in Portable Character Set 
> for a further explanation of the graphical representations of (abstract) 
> characters, as opposed to character encodings.

Encoding errors are not characters, but bytes.  A line cannot contain
encoding errors.  Therefore, a file with encoding errors is not a text file.

