[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#30326: grep not searching through a text file (thinking it binary)
From: |
Eric Blake |
Subject: |
bug#30326: grep not searching through a text file (thinking it binary) |
Date: |
Fri, 2 Feb 2018 13:55:00 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 |
tag 30326 notabug
thanks
On 02/02/2018 01:30 PM, L. A. Walsh wrote:
> I've used grep to search through my mbox-format emails for decades, but
> I've run into a case where it seems to be ignore a text mailbox
> because, I guess, it thinks it is "binary"
Yes, that's correct.
> If I used "-Par" it finds it.
Yes, that's also correct.
>
> It seems that grep believes the file to binary and ignores it, though
> "file" calls it "text".
The file is conditionally text. The POSIX definition of a text file is
one whose lines consist of valid characters in the current locale - but
note this definition is locale-dependent! So a file that is text under
one locale may be binary under another. When you are grepping a file
encoded correctly for the current locale, you get the output you want;
when you are grepping a file that contains encoding errors for the
current locale, POSIX says behavior is undefined, so GNU grep warns you
that the file is binary (in the current locale); and your use of -a
tells grep to process it anyways. As 'file' reported that your file was
using non-ISO extended-ASCII, it probable means the file was encoded for
an 8-bit single-byte locale; and my guess is that you were running grep
under a UTF-8 locale, and generally, UTF-8 treats 8-bit single-byte
inputs as encoding errors. Hence the warning that your file is binary,
under the current locale.
You can also use 'LC_ALL=C grep' to force a locale where EVERY byte is a
valid character, and thus where you will never encounter encoding errors
(you may encounter OTHER things that make your file binary, such as
embedded NULs, but that's a different matter).
This behavior is documented and intentional, so I'm closing this as not
a bug in the tracker. However, feel free to add further comments or
questions to the thread.
And perhaps we could tweak the grep diagnostics to clarify whether a
file is binary because NUL bytes were encountered, vs. a file is binary
because encoding errors were encountered.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
signature.asc
Description: OpenPGP digital signature
- bug#30326: grep not searching through a text file (thinking it binary), L. A. Walsh, 2018/02/02
- bug#30326: grep not searching through a text file (thinking it binary),
Eric Blake <=
- bug#30326: grep not searching through a text file (thinking it binary), L A Walsh, 2018/02/02
- bug#30326: grep not searching through a text file (thinking it binary), Paul Eggert, 2018/02/02
- bug#30326: grep not searching through a text file (thinking it binary), L A Walsh, 2018/02/02
- bug#30326: grep not searching through a text file (thinking it binary), Paul Eggert, 2018/02/02
- bug#30326: grep not searching through a text file (thinking it binary), L A Walsh, 2018/02/02
- bug#30326: grep not searching through a text file (thinking it binary), Paul Eggert, 2018/02/02
- bug#30326: grep not searching through a text file (thinking it binary), L A Walsh, 2018/02/02
- bug#30326: grep not searching through a text file (thinking it binary), Paul Eggert, 2018/02/04
- bug#30326: grep not searching through a text file (thinking it binary), Paul Jackson, 2018/02/05
- bug#30326: grep not searching through a text file (thinking it binary), Paul Eggert, 2018/02/05