|Subject:||bug#20526: BUG: text file is detected as binary|
|Date:||Mon, 11 May 2015 21:27:35 -0700|
|User-agent:||Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0|
Kamil Dudka wrote:
Which bug does it fix?
I don't recall a bug report being filed for it, but the old grep behavior had real problems: as I remember at times it dumped core, and at other times it spit out improperly encoded data to the terminal. We've fixed the core dumps I know about, though I think grep still outputs improperly encoded data at times (and this should get fixed too -- see below for a suggestion).
At any rate, applications could never assume a particular behavior for improperly encoded files, so the current behavior is clearly not a bug. Users may be able to scrape along by setting LC_ALL=C before running 'grep' -- the problems LC_ALL=C runs into are about the same as the problems with using old 'grep' (except that the new grep doesn't dump core :-).
Perhaps we can improve the behavior of grep by changing its heuristic slightly. Currently grep reports "Binary file FOO matches" if it finds binary data in FOO before it finds the first match. Instead, perhaps we could change grep to report "Binary file FOO matches" when it sees that it's about to generate binary *output* copied from FOO, regardless of whether this output represents the first match. That is, when grep sees that it's about to output binary data, grep instead outputs "Binary file FOO matches" and then stops output for FOO (even if it already output some lines for ordinary matches in FOO).
This approach would fix the problem of grep trashing the output stream, and it should be less drastic than grep's current approach, in that it would make grep more likely to do what Kamil Dudka is asking for (assuming grep is given mostly valid input interspersed with small amounts of binary data).
|[Prev in Thread]||Current Thread||[Next in Thread]|