bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#13498: "cut -f" lags a line


From: Pádraig Brady
Subject: bug#13498: "cut -f" lags a line
Date: Sun, 20 Jan 2013 12:41:12 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1

On 01/19/2013 08:35 AM, Scott Lamb wrote:
"cut -f" has an apparently long-standing behavior that I'd consider a
bug: it does not fully send line N to stdout until the first character
of line N+1 has been read on stdin. This is confusing when stdin comes
from "tail -f" or the like. The exact behavior varies slightly. If
stdin is a tty, all but the trailing newline will be flushed
immediately and then the trailing newline will be flushed when the
next character shows up. If stdin is not a tty, there's no flush at
all until the next character shows up.

For example, if I type the following into a shell on Ubuntu 12.04.1,
meaning cut from coreutils 8.13 and glibc package version
2.15-0ubuntu10.3:

     cut -f1-
     foo
     bar
     baz
     ^D

I will see the following:

     $ cut -f1-
     foo
     foobar

     barbaz

     baz
     $

and if I instead use "cat | cut -f1-" in the first line, I will see
the following:

     $ cat | cut -f1-
     foo
     bar
     foo
     baz
     bar
     baz
     $

(coreutils's cut -c does not have the same laggy behavior. Neither
does BSD cut on my OS X machine in either -c or -f mode.)

This code in cut_fields (still found in trunk tip) is responsible for
delaying the newline; it runs between the newline being read and being
written:

       if (c == '\n')
         {
           c = getc (stream);
           if (c != EOF)
             {
               ungetc (c, stream);
               c = '\n';
             }
         }

I believe that code is there to avoid turning one newline at EOF into
two, but that goal could be accomplished in another way.

I don't know exactly why the behavior differs based on stdin being a
tty or not. My best guess is that glibc might have some logic that, if
stdin is a tty, automatically flushes stdout any time the program
blocks on stdin. glibc's stdio internals are a bit hard for me to
follow, so I haven't found the code in question. Apparently this is a
vaguely standardized behavior; I see a stackoverflow post mentioning
the following:

"""
The input and output dynamics of interactive devices shall take place
as specified in 7.19.3. The intent of these requirements is that
unbuffered or line-buffered output appear as soon as possible, to
ensure that prompting messages actually appear prior to a program
waiting for input.

(ISO/IEC 9899:TC2 Committee Draft -- May 6, 2005, page 14).
"""

For my reference:
http://comments.pixelbeat.org/programming/stdio_buffering/#comment-250521

Yes the use of ungetc() is awkward in cut.
I notice that pr is the only other util using ungetc.
Also the i18n version of cut on my system has a rewritten
cut_fields() function that doesn't exhibit the behavior.

ungetc() is coupled with the use of getndelim2(),
but I'll have a look at addressing this.

thanks,
Pádraig.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]