parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: parallel cat


From: Dan Kokron
Subject: Re: parallel cat
Date: Fri, 29 Jul 2011 12:41:56 -0400

Thanks to all who made suggestions.  Using parallel for this task did
improve performance substantially.

Dan

On Sun, 2011-07-17 at 09:08 -0500, Ole Tange wrote:
> On Fri, Jul 15, 2011 at 8:39 PM, Dan Kokron <daniel.kokron@nasa.gov> wrote:
> 
> > I have a bunch (~200) small (1K to 100K) binary files that I want to
> > 'cat' into a larger file.  I usually use "cat pe* > diag", but this
> > takes considerable time on the Lustre file system we are using.  I am
> > exploring using GNU parallel for this task but have run into some
> > difficulties.  Basically the resulting diag file only contains one of
> > the input files.
> >
> > I've tried the following variations.
> >
> > parallel "cat {} >diag_amsua_n18_03.2011041700" ::: pe*
> > parallel cat {} ">"diag_amsua_n18_03.2011041700 ::: pe*
> > ls pe* | parallel cat {} ">"diag_amsua_n18_03.2011041700
> > ls pe* | parallel -j4 -k cat {} ">"diag_amsua_n18_03.2011041700
> > ls pe* | parallel -k cat {} ">"diag_amsua_n18_03.2011041700
> > parallel -j4 -k "cat {} >diag_amsua_n18_03.2011041700" ::: pe*
> 
> You are _so_ close.
> 
> parallel cat >diag_all ::: pe*
> 
> It is probably more readable for UNIX users to write this (It does
> exactly the same):
> 
> parallel cat ::: pe* >diag_all
> 
> Or if you prefer the order kept:
> 
> parallel -k cat ::: pe* >diag_all
> 
> I have no experience with Lustre, but I would imagine that Lustre is
> slow at getting the first byte and after that it is pretty fast. Also
> the reason why it is slow is because it is waiting. If that is the
> case then it will be OK to run a lot of cats simultaneously:
> 
> parallel -j0 cat ::: pe* >diag_all
> 
> These sections of the man page touches the subject of using the output
> from GNU Parallel:
> 
> EXAMPLE: Rewriting a for-loop and a while-read-loop
> EXAMPLE: Rewriting nested for-loops
> EXAMPLE: Keep order of output same as order of input
> EXAMPLE: Processing a big file using more cores
> 
> If you believe it can be explained better please post your suggestion
> for discussion here.
> 
> 
> /Ole
-- 
Dan Kokron
Global Modeling and Assimilation Office
NASA Goddard Space Flight Center
Greenbelt, MD 20771
Daniel.S.Kokron@nasa.gov
Phone: (301) 614-5192
Fax:   (301) 614-5304




reply via email to

[Prev in Thread] Current Thread [Next in Thread]