bug-sed
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#26879: end-of-line issue with cygwin 4.4-1 sed 4.4


From: Eric Blake
Subject: bug#26879: end-of-line issue with cygwin 4.4-1 sed 4.4
Date: Thu, 11 May 2017 15:29:13 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.0

On 05/11/2017 03:13 PM, Assaf Gordon wrote:
> Hello all,
> 
> 
> Eric,
> Just to verify (since I'm not very familiar with cygwin, nor with the
> recent changes in sed/cygwin changes):
> 
> If one wants the old sed behavior on cygwin (automatic
> handling of CR/LF),
> all that's needed is rebuilding sed from upstream source?
> 
> That is:
> 
>     wget https://ftpmirror.gnu.org/sed/sed-4.4.tar.xz
>     tar -xf sed-4.4.tar.xz
>     cd sed-4.4
>     ./configure
>     make
>     sudo make install
> 
> And then the sed binary will handle CR/LF transparently?
> (and will also have the old "-b/--binary" flag to disable
> automatic CR/LF handling) ?

No. If you want to FORCE sed to treat input as text (to transparently
ignore CR), you have to make a tweak to the source code (either to link
with cygwin's textmode.o that turns text mode on EVERYWHERE, or to add
freopen("rt") or setmode(O_TEXT) calls in appropriate places.

The default upstream behavior has ALWAYS been to handle files in native
mode (ie. open("r") - where the choice of text or binary is determined
by the file system).  Downstream Cygwin sed USED to have a patch that
overrode upstream behavior to do freopen(NULL, "rt", stdin) - which is
not portable outside of Cygwin, but which on Cygwin is defined to
forcefully reopen a file in text mode, even if it was originally in
binary mode.  If your file is already in text mode, the downstream patch
made no difference; but if your file was in binary mode, the downstream
patch forcefully corrupted your data by eating \r.

With the Cygwin build 4.4-1 of sed, that downstream patch was eliminated
(along with corresponding downstream hacks in awk and grep, as well as
an upstream simplification in grep made possible now that downstream was
no longer forcing text mode, http://bugs.gnu.org/25707), so that all
three tools reliably treated binary files as binary, and your choice of
text vs. binary mount was honored by using open("r") (rather than
open("rb") which forces binary or open("rt") which is non-POSIX but on
cygwin forces text).

The drawback is that not all input is on a file system - if your input
comes through a pipeline, you can't set the mount mode of a pipeline,
and cygwin assumes all pipes are in binary mode.  But in those cases,
you can always modify your pipeline to inject another filter to eat the
\r before handing the data to sed.

> 
> I'm asking this to help find an easy work-around in case
> other cygwin users want revert to the "old ways".
> (also useful for us upstream to know about this cygwin change,
> in case we get more bug reports).

The source code with the downstream patches for the older cygwin builds
is still available; in fact, the EASIEST thing might be to tell
disgruntled cygwin users to google for "Cygwin time machine" and install
grep, sed, and awk from Jan 2017 (pre-dating the Feb 2017 switch in
behavior), as the patch you would apply to upstream sources is already
built into those older downstream binaries.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]