emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debbugs-tracker] bug#30326: closed (grep not searching through a text f


From: GNU bug Tracking System
Subject: [debbugs-tracker] bug#30326: closed (grep not searching through a text file (thinking it binary))
Date: Fri, 02 Feb 2018 19:56:02 +0000

Your message dated Fri, 2 Feb 2018 13:55:00 -0600
with message-id <address@hidden>
and subject line Re: bug#30326: grep not searching through a text file 
(thinking it binary)
has caused the debbugs.gnu.org bug report #30326,
regarding grep not searching through a text file (thinking it binary)
to be marked as done.

(If you believe you have received this mail in error, please contact
address@hidden)


-- 
30326: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=30326
GNU Bug Tracking System
Contact address@hidden with problems
--- Begin Message --- Subject: grep not searching through a text file (thinking it binary) Date: Fri, 02 Feb 2018 11:30:07 -0800 User-agent: Thunderbird
I've used grep to search through my mbox-format emails for decades, but
I've run into a case where it seems to be ignore a text mailbox
because, I guess, it thinks it is "binary" (I think ignoring binary
is a default in my aliases file).

I used:

 grep -Pr 'Game:\s+NCSOFT' *

and it ignored a mailbox named 'Domain': that contained the
string:
"                                    =E2=80=A2=09Game: NCSOFT"

 file Domain
Domain: Non-ISO extended-ASCII text, with very long lines


If I used "-Par" it finds it.

It seems that grep believes the file to binary and ignores it, though
"file" calls it "text".

Any ideas?

grep -V
grep (GNU grep) 2.21.31-adf9

Maybe grep is being a bit overzealous in calling files 'binary'?











--- End Message ---
--- Begin Message --- Subject: Re: bug#30326: grep not searching through a text file (thinking it binary) Date: Fri, 2 Feb 2018 13:55:00 -0600 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2
tag 30326 notabug
thanks

On 02/02/2018 01:30 PM, L. A. Walsh wrote:
> I've used grep to search through my mbox-format emails for decades, but
> I've run into a case where it seems to be ignore a text mailbox
> because, I guess, it thinks it is "binary"

Yes, that's correct.

> If I used "-Par" it finds it.

Yes, that's also correct.

> 
> It seems that grep believes the file to binary and ignores it, though
> "file" calls it "text".

The file is conditionally text.  The POSIX definition of a text file is
one whose lines consist of valid characters in the current locale - but
note this definition is locale-dependent!  So a file that is text under
one locale may be binary under another.  When you are grepping a file
encoded correctly for the current locale, you get the output you want;
when you are grepping a file that contains encoding errors for the
current locale, POSIX says behavior is undefined, so GNU grep warns you
that the file is binary (in the current locale); and your use of -a
tells grep to process it anyways.  As 'file' reported that your file was
using non-ISO extended-ASCII, it probable means the file was encoded for
an 8-bit single-byte locale; and my guess is that you were running grep
under a UTF-8 locale, and generally, UTF-8 treats 8-bit single-byte
inputs as encoding errors.  Hence the warning that your file is binary,
under the current locale.

You can also use 'LC_ALL=C grep' to force a locale where EVERY byte is a
valid character, and thus where you will never encounter encoding errors
(you may encounter OTHER things that make your file binary, such as
embedded NULs, but that's a different matter).

This behavior is documented and intentional, so I'm closing this as not
a bug in the tracker.  However, feel free to add further comments or
questions to the thread.

And perhaps we could tweak the grep diagnostics to clarify whether a
file is binary because NUL bytes were encountered, vs. a file is binary
because encoding errors were encountered.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]