parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: parallel cat


From: Ole Tange
Subject: Re: parallel cat
Date: Mon, 18 Jul 2011 13:13:50 +0200

On Sun, Jul 17, 2011 at 4:32 PM, Hans Schou <chlor@schou.dk> wrote:
> On Fri, 15 Jul 2011, Dan Kokron wrote:
>
>> I have a bunch (~200) small (1K to 100K) binary files that I want to
>> 'cat' into a larger file.  I usually use "cat pe* > diag", but this
>
> I guess that "pe*" is sorted before they are 'cat'. If you don't need them
> sorted try this one:
>  time ls -U pe* | parallel cat {} '>>' diag

I believe the '>>' approach is dangerous as it causes a race condition:

# create 3 big files
parallel perl -e "'print \"{}\"x100000000' >aa{}" ::: a b c
# append them together in parallel
parallel -j0 cat '>>diag' ::: aa*
# Check if all the a's are first, followed by b's and c's by replacing
repeating letters with a single letter.
perl -e 'while(sysread(STDIN,$a,100000000)) { $a=~s/a+/a/g;
$a=~s/b+/b/g; $a=~s/c+/c/g; print "$a" }' < diag

On my system the a, b, and c's are mixed. I believe it is because you
run the 3 cat's in parallel while appending to the same file. When you
redirect the output inside the command being run you do not give GNU
Parallel a chance to make sure the output does not mix.

You may be lucky that you do not see output mixed, but that will
happen purely by chance.

Using the following GNU Parallel will make sure the output is not mixed:

parallel cat ::: aa* > diag
# Check if all the a's are first, followed by b's and c's by replacing
repeating letters with a single letter.
perl -e 'while(sysread(STDIN,$a,100000000)) { $a=~s/a+/a/g;
$a=~s/b+/b/g; $a=~s/c+/c/g; print "$a" }' < diag


/Ole



reply via email to

[Prev in Thread] Current Thread [Next in Thread]