[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#25707: [PATCH] grep: don't forcefully strip carriage returns
From: |
Eli Zaretskii |
Subject: |
bug#25707: [PATCH] grep: don't forcefully strip carriage returns |
Date: |
Thu, 16 Feb 2017 20:06:07 +0200 |
> Cc: address@hidden, address@hidden
> From: Eric Blake <address@hidden>
> Date: Thu, 16 Feb 2017 11:40:29 -0600
>
> On 02/16/2017 11:26 AM, Eli Zaretskii wrote:
>
> >> I'm of the opinion that undossify_input causes more problems than it
> >> solves. We should trust fopen("r") to do the right thing, rather than
> >> reinventing it ourselves.
> >
> > FYI: You'd be losing an important feature for non-Cygwin DOS/Windows
> > users if you remove undossify_input and decide to trust fopen's "r"
> > (or "rt") mode.
>
> "rt" mode is not required to exist. And I don't know any modern
> implementation of "r" mode on a system with non-zero O_BINARY that eats
> ALL \r - both Cygwin and mingw just change \r\n into \n while still
> preserving other \r. The undossify() code that Paul just removed did
> NOT behave the same as text mode (in that it did, perhaps
> unintentionally, eat ALL \r).
I explicitly said my comments were not about Cygwin.
And you are forgetting the "stop at first ^Z" misfeature of text-mode
reads.
> I count it worse to TRY and reimplement the OS "r" mode and get the
> implementation wrong, with more lines of code, than to just trust the OS
> to do it correctly in the first place.
It is no use trusting the OS if it doesn't DTRT.
> The undossify() code may do the right thing on text files, but is
> absolutely wrong on binary files.
Grep is mainly a text-processing program. Its use with binary files
is a much rarer use case, and the user has opt-in options for those.
IMO it is more important to DTRT by default in the usual cases than
err in rare cases when the user fails to specify those opt-in options
needed to support that correctly.
> No, I don't know of any fopen(,"r") code that eats _all_ CR.
I do.
> Yes, you do make a point that the side effect of reimplementing text
> mode ourselves on a forced binary fd lets us "count" byte offsets where
> the count could be text while the scan was binary, or where the count
> could be binary while the scan was text. But in reality, are there any
> users that ever want a mixed-mode count?
Yes.
> If you are scanning in binary, you want the binary count; if you are
> scanning in text, you want the text count.
That's up to the user; Grep shouldn't second-guess its users, and
shouldn't force them into a specific modus operandi without a fire
escape.
But this is a futile argument: I don't expect to win it, and I'm well
aware that recent Grep releases are much less friendly to non-Posix
users than previous ones. Which is why I will stay with Grep 2.10,
which works satisfactorily for me.
The purpose of my message was to get on record about what these
changes mean: they mean losing features which some find useful. You
are not changing code which was written by someone who didn't know
about text-mode I/O, or didn't understand that this could be done
better, faster, etc., by dropping features.