[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Feature request for head command
From: |
Pádraig Brady |
Subject: |
Re: Feature request for head command |
Date: |
Wed, 21 Jun 2006 10:05:40 +0100 |
User-agent: |
Mozilla Thunderbird 1.0.8 (X11/20060502) |
Robert McKay wrote:
> Synopsis of the problem:
>
> When you use head to read a certain number of lines out of a pipe
> sometimes it eats more data than you ask it to,
>
> For example:
>
> address@hidden ~]$ ( echo hello; echo world; ) | ( head -1 >
> /dev/null; cat)
> address@hidden ~]$ ( echo hello; sleep 1; echo world; ) | ( head -1
>
>> /dev/null; cat)
>
> world
>
> So.. why does the first command not print anything but the second
> command prints out "world"?
>
> Well in the first case hello and world were immediately available to
> be read and head -1 read them both into it's buffer before discovering
> that actually it should have stopped at the first newline.
>
> In the second instance the sleep 1 in the middle causes head's read()
> call to return early, (because no more data was immediately available
> to be read), and head realized that it already had a newline and
> exited, leaving the "world" for cat to read.
>
> What to do about this? head can't unread data that it's already read
> so the only solution is for it to read the input one byte at a time -
> constantly checking for a \n (newline). Is this inefficient? Yes, of
> course it is but it would also be very useful. I'm not proposing that
> this be made the default behavior, simply that a new option be added
> to support it.
>
> I eagerly await your flames :-)
Note head will work as expected if input is a file.
I.E. it does an lseek back over the data it didn't process.
I've documented stdin buffering issues here:
http://www.pixelbeat.org/programming/stdio_buffering/
There is a patch there that I'm thinking of
cleaning up and applying to all appropriate coreutils.
Pádraig.