bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#22838: New 'Binary file' detection considered harmful


From: Paul Eggert
Subject: bug#22838: New 'Binary file' detection considered harmful
Date: Sun, 28 Feb 2016 14:13:43 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1

Marcello Perathoner wrote:

The new behaviour of grep -- to output 'Binary file matches' after output
started

I assume that the "new behavior" you're talking about is for grep 2.21 (2014-11-23) and later, as that's the version of grep that started outputting "Binary file matches" due to input encoding errors. For example, on my platform (Ubuntu 15.10), the shell command:

LC_ALL=C awk 'BEGIN {for(i=1; i<256; i++) printf "%c %d\n", i, i}' |
LC_ALL=en_US.utf8 grep 126

outputs "Binary file (standard input) matches" in grep 2.21.

These changes were put in partly due to security issues, not only having to do with grep's internals (the old 'grep' would dump core sometimes when given encoding errors), but also for the benefit of invokers expecting properly encoded text.

To some extent we were stuck between a rock and a hard place here. No matter what 'grep' does, it will do the wrong thing for some usages. But overall we thought it better for grep's output to be valid text.

I think you can work around the problem for unfixed backup2l by setting your system's locale to a unibyte locale where all bytes are valid. The en_US.ISO-8859-15 locale, say.

Of course backup2l should get fixed, regardless of what we do with 'grep' or with your system locale.

$ find /etc/ssl/certs/ | LANG= grep pem

Wouldn't the following be better?

find /etc/ssl/certs/ -name '*.pem'

This avoids false matches like '/etc/ssl/certs/pemmican'.  Alternatively:

find /etc/ssl/certs/ -print | grep -a '\.pem$'

It is easy to catch such a file from the internet or from song or picture 
metadata.

None of the above approaches will work for arbitrary file names ("off the Internet"), because they all mishandle file names containing newlines. backup2l needs to do something like this:

find /etc/ssl/certs/ -name '*.pem' -print0

or like this:

find /etc/ssl/certs/ -print0 | grep -az '\.pem$'

with remaining code using null bytes instead of newlines to terminate file names. This is the sort of thing that backup2l should be doing.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]