bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#33281: head does not consume input after '-c' is satisfied


From: Philip Rowlands
Subject: bug#33281: head does not consume input after '-c' is satisfied
Date: Mon, 05 Nov 2018 21:17:49 +0000

On Mon, 5 Nov 2018, at 20:30, Luiz Angelo Daros de Luca wrote:
> 
> Once head read enough bytes to satisfy -c option, it stops reading input
> and quit.
> This is different from what -n does and it is also different from both
> FreeBSD and busybox head implementation.
> 
> With GNU Coreutils head:
> 
> $ echo -e "123\n456\n789" | { head -n 1; while read a; do echo "-$a-";
> done; }
> 123

This is incomplete; head doesn't read everything, but more than one line. On my 
(rather aged Linux) system:
$ head --version
head (GNU coreutils) 8.25

$ seq 1864 | { head -n 1; while read a; do echo "-$a-"; done; }
1
--
-1861-
-1862-
-1863-
-1864-

What's special about 1860 lines of output? It's just over the amount of data 
which head reads from the pipe, 8192 bytes.

$ seq 1860 | wc -c
8193

> $ echo -e "123\n456\n789" | { head -c 4; while read a; do echo "-$a-";
> done; }
> 123
> -456-
> -789-

In this case head knows it only needs 4 bytes, so only reads 4 bytes.

> With all other head implementations I tested:
> 
> $ echo -e "123\n456\n789" | { head -c 4 ; while read a ; do echo "-$a-" ;
> done ; }
> 123
> $
> 
> It would make sense to both -n and -c have the same meaning, differing only
> whether to read bytes or lines.

Consistency would be good, but consider in the case of lines, head doesn't know 
up-front how much data to read. The only way to read exactly the right amount, 
not a byte more, would be to read one byte at a time, something of a 
performance killer. It's not possible to "un-read" data you've collected via 
the read syscall.

To achieve consistency in the other direction, head could ignore the 
optimization to reduce the number of bytes read, and always read 8192 bytes, 
knowing that some would be discarded. This seems to be more in line with the 
other implementations you've tried.

For consistency's sake, what would these do? For widely differing values, the 
only way to produce the same residual output would be to consume all input data.
$ cat file.txt | { head -n 100; wc -c; }
$ cat file.txt | { head -c 100KB; wc -c; }


Cheers,
Phil





reply via email to

[Prev in Thread] Current Thread [Next in Thread]