[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#46048: split -n K/N loses data, sum of output files is smaller than
From: |
Pádraig Brady |
Subject: |
bug#46048: split -n K/N loses data, sum of output files is smaller than input file. |
Date: |
Sun, 24 Jan 2021 16:52:57 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:84.0) Gecko/20100101 Thunderbird/84.0 |
On 23/01/2021 04:58, Paul Hirst wrote:
split --number K/N appears to lose data in, with the sum of the sizes of
the output files being smaller than the original input file by 131072 bytes.
$ split --version
split (GNU coreutils) 8.30
...
$ head -c 1000000 < /dev/urandom > test.dat
$ split --number=1/4 test.dat > t1
$ split --number=2/4 test.dat > t2
$ split --number=3/4 test.dat > t3
$ split --number=4/4 test.dat > t4
$ ls -l
-rw-r--r-- 1 user user 250000 Jan 22 18:36 t1
-rw-r--r-- 1 user user 250000 Jan 22 18:36 t2
-rw-r--r-- 1 user user 250000 Jan 22 18:36 t3
-rw-r--r-- 1 user user 118928 Jan 22 18:36 t4
-rw-r--r-- 1 user user 1000000 Jan 22 18:33 test.dat
Surely this should not be the case?
Ugh. This functionality was broken for all files > 128KiB
due to adjustments for handling /dev/zero
$ truncate -s 1000000 test.dat
$ split --number=4/4 test.dat | wc -c
118928
The following patch fixes it here.
I need to do some more testing, before committing.
thanks!
diff --git a/src/split.c b/src/split.c
index 0660da13f..6aa8d50e9 100644
--- a/src/split.c
+++ b/src/split.c
@@ -1001,7 +1001,7 @@ bytes_chunk_extract (uintmax_t k, uintmax_t n, char *buf,
size_t bufsize,
}
else
{
- if (lseek (STDIN_FILENO, start, SEEK_CUR) < 0)
+ if (lseek (STDIN_FILENO, start, SEEK_SET) < 0)
die (EXIT_FAILURE, errno, "%s", quotef (infile));
initial_read = SIZE_MAX;
}