[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Line buffering doesn't work with --shard
From: |
David King |
Subject: |
Line buffering doesn't work with --shard |
Date: |
Thu, 12 Dec 2019 17:54:58 -0800 |
User-agent: |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.9.1 |
This happens using version 20191122 on macOS 10.14 and Ubuntu 14.04. I
also tested 20191022 with the same result. I believe that this is two
separate bugs but with similar symptoms.
Bug #1:
for x in `seq 1 10`; do cat /usr/share/dict/words; done | awk -F'\t'
'{print $1 "\t" $1 "\t" $1 "\t" $1}' | ./parallel --line-buffer -j16
--pipe --colsep '\t' --shard 1 -- cat | awk -F'\t' 'NF!=4'
This just generates a large file with 4 identical columns per record,
then runs it through 16 copies of cat, and checks that each record still
has 4 columns. It should have no output, but instead the output looks like
Scheherazade Scheherazade Sc strangleweed strangleweed
strangleweed
abstractnesses abstractnesses abstractnesses abstractnesggeration
overexaggeration overexaggeration
gyvi evolutionarily
brokenheartedness brokenheartedness brokenheartedness
brokenhears autoplasties autoplasties
...
All records should be 4 columns of the same word, but you can see that
many records are combined or split.
Bug #2: the above happens on *both* the read side and the write side.
This should also have no output:
for x in `seq 1 10`; do cat /usr/share/dict/words; done | awk -F'\t'
'{print $1 "\t" $1 "\t" $1 "\t" $1}' | ./parallel --line-buffer -j16
--pipe --colsep '\t' --shard 1 -- "awk -F'\t' 'NF!=4'"
but it does have (less quantity but similarly structured) output.
I realise that bug #1 could be bug #2 in disguise but I believe I've
confirmed that both happen independently. If they were the same bug,
this would have no output:
for x in `seq 1 10`; do cat /usr/share/dict/words; done | awk -F'\t'
'{print $1 "\t" $1 "\t" $1 "\t" $1}' | ./parallel --line-buffer -j16
--pipe --colsep '\t' --shard 1 -- "awk -F'\t' 'NF==4'" | awk -F'\t' 'NF!=4'
but it does have a smaller amount than either of the previous cases.
Small enough that I had to run it 4 or 5 times to get it to have any
(probably because you need both bugs to happen to the same row which is
rare), but it does. Test case #1 has a lot of output, test case #2 has a
smaller amount, test case #3 has a small enough amount that it takes a
few tries to get any at all.
Separately, I've also experienced intermittent hangs with --shard. The
symptom is that the background processes have all finished and output
has stopped but parallel doesn't exit. I haven't found a minimal way to
reproduce it, but the second and third test cases above do for me.
- Line buffering doesn't work with --shard,
David King <=