[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#25707: [PATCH] grep: don't forcefully strip carriage returns

From: Eli Zaretskii
Subject: bug#25707: [PATCH] grep: don't forcefully strip carriage returns
Date: Thu, 16 Feb 2017 20:06:07 +0200

> Cc: address@hidden, address@hidden
> From: Eric Blake <address@hidden>
> Date: Thu, 16 Feb 2017 11:40:29 -0600
> On 02/16/2017 11:26 AM, Eli Zaretskii wrote:
> >> I'm of the opinion that undossify_input causes more problems than it
> >> solves.  We should trust fopen("r") to do the right thing, rather than
> >> reinventing it ourselves.
> > 
> > FYI: You'd be losing an important feature for non-Cygwin DOS/Windows
> > users if you remove undossify_input and decide to trust fopen's "r"
> > (or "rt") mode.
> "rt" mode is not required to exist. And I don't know any modern
> implementation of "r" mode on a system with non-zero O_BINARY that eats
> ALL \r - both Cygwin and mingw just change \r\n into \n while still
> preserving other \r.  The undossify() code that Paul just removed did
> NOT behave the same as text mode (in that it did, perhaps
> unintentionally, eat ALL \r).

I explicitly said my comments were not about Cygwin.

And you are forgetting the "stop at first ^Z" misfeature of text-mode

> I count it worse to TRY and reimplement the OS "r" mode and get the
> implementation wrong, with more lines of code, than to just trust the OS
> to do it correctly in the first place.

It is no use trusting the OS if it doesn't DTRT.

> The undossify() code may do the right thing on text files, but is
> absolutely wrong on binary files.

Grep is mainly a text-processing program.  Its use with binary files
is a much rarer use case, and the user has opt-in options for those.
IMO it is more important to DTRT by default in the usual cases than
err in rare cases when the user fails to specify those opt-in options
needed to support  that correctly.

> No, I don't know of any fopen(,"r") code that eats _all_ CR.

I do.

> Yes, you do make a point that the side effect of reimplementing text
> mode ourselves on a forced binary fd lets us "count" byte offsets where
> the count could be text while the scan was binary, or where the count
> could be binary while the scan was text.  But in reality, are there any
> users that ever want a mixed-mode count?


> If you are scanning in binary, you want the binary count; if you are
> scanning in text, you want the text count.

That's up to the user; Grep shouldn't second-guess its users, and
shouldn't force them into a specific modus operandi without a fire

But this is a futile argument: I don't expect to win it, and I'm well
aware that recent Grep releases are much less friendly to non-Posix
users than previous ones.  Which is why I will stay with Grep 2.10,
which works satisfactorily for me.

The purpose of my message was to get on record about what these
changes mean: they mean losing features which some find useful.  You
are not changing code which was written by someone who didn't know
about text-mode I/O, or didn't understand that this could be done
better, faster, etc., by dropping features.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]