bug-sed
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#26879: end-of-line issue with cygwin 4.4-1 sed 4.4


From: Eric Blake
Subject: bug#26879: end-of-line issue with cygwin 4.4-1 sed 4.4
Date: Sat, 13 May 2017 15:05:59 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.0

On 05/12/2017 04:58 PM, Dick Dunbar wrote:
> Oh, I didn't realize this sed wasn't the cygwin choice.

[please don't top-post on technical lists]

What do you mean by "this sed wasn't the cygwin choice"? Cygwin is using
GNU sed, and has been for many years.  In fact, I currently package sed
for cygwin downstream.

But the downstream choice of whether to add a hack to use open("rt") or
to use the upstream behavior of open("r") [or, in the case of the 4.4-1
package, to remove the hack] is exactly that - something that was
decided downstream by Cygwin people. Upstream has no control over what
additional patches (if any) downstream wants to use or avoid.  This list
is the upstream list. If you want your complaints to be heard by the
cygwin community at large, so other cygwin users can chime in on the
behavior that is best for the Cygwin distribution, then reach out to the
cygwin community, not this list.

> 
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=25707
> 
> Reading through those notes, it appears that the sed changes were
> predicated on something called a "text mount".

Yes, a cygwin text mount is what controls whether open("r") will strip
\r (stripped from a text mount, left intact from a binary mount).

> 
> And you were depending on the OS to " undossify_input".

undossify_input is not in the sed sources.  Now you are pointing to a
grep bug (yes, it is related, but let's be careful on what we are
attributing to sed, vs. what we are attributing to downstream).

For years, the grep project had a function named undossify_input that
tried to manually strip \r from files - except that it didn't do as
advertised. It didn't do anything on text mount files (since \r was
already stripped), and it incorrectly removed \r from binary files
(where the whole point of a binary file is that it is NOT supposed to
have \r stripped).  So, as part of downstream cygwin unifying the
behavior of grep, sed, and awk, I pointed out that we could apply an
upstream patch to grep to greatly simplify the upstream source by
ripping out special-case code for Cygwin that wasn't correct in the
first place.

> And DTRT is dependent on the cygwin kernel to perform this service
> so that sed/awk/grep/et.al wouldn't have to deal with it.

Yes, you generally want the right behavior to be done at a common place,
so that it doesn't have to be duplicated everywhere else.

> 
> If so, it appears that cygwin does not have that "strip \r" functionality
> and that's why it is failing for me.

Huh? Cygwin DOES have the ability to strip \r from files. You get it by
mounting the directory containing the file as a text mount.

Maybe you are asking whether cygwin should have an ability to
automatically strip \r from pipelines where the source end of the pipe
is not a native cygwin process (and therefore more likely to be
producing \r), since pipelines are not a file system that you mount and
therefore can't be given a mount option of text-vs-binary.  I don't know
of such an ability at the present (years ago, you could set a substring
in the $CYGWIN environment variable to force ALL pipelines to be in text
mode, but it was removed years ago because of the problems it used to
cause).  But maybe it is worth proposing a patch to add such a
capability back into cygwin1.dll.  And doing it on JUST pipelines
connected to native processes, rather than all pipelines, will let
cygwin continue to behave sanely on pipelines from other cygwin
processes (where you WANT binary handline).

But such a patch would be to cygwin1.dll (which is NOT maintained here),
so you'd have to propose it downstream to the cygwin list.

> 
> How close am I getting to fully  understanding this?

It's hard for me to say, because I feel like I have been repeating the
same things.  In particular, I've repeated my plea for you to take this
to the cygwin list, and yet here you are still asking on the upstream
sed list.

> 
> I can imagine there might be a quite a lot of "sed consumers" who
> will also experience this failure.

There have been one or two threads on the cygwin list in the past three
months by other people hit by surprise that the intermixing of native
windows processes into cygwin sed/grep/awk have changed behavior, but
surprisingly not many by the standards of how much other volume the
cygwin list gets.  For example:
https://cygwin.com/ml/cygwin/2017-05/msg00161.html

>  And those OS consumers also
> have to deal with over-the-wall Windows and Mac files in their
> environment.

If you are already dealing with files created by one OS and mounted in
another OS, then you should already be quite familiar with how to filter
your files to have desirable line endings for the system where you plan
to process the file.

Furthermore, Cygwin tries hard to emulate Linux. How would you process a
file containing \r\n that you copied onto Linux? That's exactly the same
way you should process that file on cygwin (at least, with default
binary mounts).  The fact that cygwin used to have a hack where it
filtered \r on your behalf (unlike what Linux would do), and now no
longer has that hack, should not affect you if you had already been
stripping \r yourself.  And the fact that the hack corrupted binary
data, and now cygwin can process binary data without corruption, is one
of the stronger justifications why cygwin maintainers decided to remove
the hack.  There was a long email thread on the subject at the time:
https://cygwin.com/ml/cygwin/2017-02/threads.html#00152
(titled Updated [test]: sed-4.4-1)

and probably several others that I didn't bother to locate while typing
this.

> 
> I never heard of a text/binary mount point that would cause
> an operating system to treat text files differently.
> 
> Do you have  pointer to some literature that explains that so I can educate
> myself?

How about these pages of the Cygwin documentation:
https://cygwin.com/cygwin-ug-net/using-textbinary.html
https://cygwin.com/cygwin-ug-net/using.html#mount-table

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]