[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Line buffering doesn't work with --shard

From: David King
Subject: Line buffering doesn't work with --shard
Date: Thu, 12 Dec 2019 17:54:58 -0800
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.9.1

This happens using version 20191122 on macOS 10.14 and Ubuntu 14.04. I
also tested 20191022 with the same result. I believe that this is two
separate bugs but with similar symptoms.

Bug #1:

for x in `seq 1 10`; do cat /usr/share/dict/words; done | awk -F'\t'
'{print $1 "\t" $1 "\t" $1 "\t" $1}' | ./parallel --line-buffer -j16
--pipe --colsep '\t' --shard 1 -- cat | awk -F'\t' 'NF!=4'

This just generates a large file with 4 identical columns per record,
then runs it through 16 copies of cat, and checks that each record still
has 4 columns. It should have no output, but instead the output looks like

Scheherazade    Scheherazade    Sc      strangleweed    strangleweed
abstractnesses  abstractnesses  abstractnesses  abstractnesggeration
overexaggeration        overexaggeration
gyvi    evolutionarily
brokenheartedness       brokenheartedness       brokenheartedness
brokenhears     autoplasties    autoplasties

All records should be 4 columns of the same word, but you can see that
many records are combined or split.

Bug #2: the above happens on *both* the read side and the write side.
This should also have no output:

for x in `seq 1 10`; do cat /usr/share/dict/words; done | awk -F'\t'
'{print $1 "\t" $1 "\t" $1 "\t" $1}' | ./parallel --line-buffer -j16
--pipe --colsep '\t' --shard 1 -- "awk -F'\t' 'NF!=4'"

but it does have (less quantity but similarly structured) output.

I realise that bug #1 could be bug #2 in disguise but I believe I've
confirmed that both happen independently. If they were the same bug,
this would have no output:

for x in `seq 1 10`; do cat /usr/share/dict/words; done | awk -F'\t'
'{print $1 "\t" $1 "\t" $1 "\t" $1}' | ./parallel --line-buffer -j16
--pipe --colsep '\t' --shard 1 -- "awk -F'\t' 'NF==4'" | awk -F'\t' 'NF!=4'

but it does have a smaller amount than either of the previous cases.
Small enough that I had to run it 4 or 5 times to get it to have any
(probably because you need both bugs to happen to the same row which is
rare), but it does. Test case #1 has a lot of output, test case #2 has a
smaller amount, test case #3 has a small enough amount that it takes a
few tries to get any at all.

Separately, I've also experienced intermittent hangs with --shard. The
symptom is that the background processes have all finished and output
has stopped but parallel doesn't exit. I haven't found a minimal way to
reproduce it, but the second and third test cases above do for me.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]