parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Replacement string for process number


From: Ole Tange
Subject: Re: Replacement string for process number
Date: Thu, 23 Dec 2010 23:07:52 +0100

On Wed, Dec 22, 2010 at 3:51 PM, Jay Hacker <jayqhacker@gmail.com> wrote:
> I'd like to be able to use the number of a process in a GNU parallel
> command.

>From what you describe below it is the number of the slot of the
process. So if you run -P32 you will get at most 32 values, fewer if
there are fewer than 32 argument.

GNU Parallel cannot do that at the moment.

$PARALLEL_PID and $PARALLEL_SEQ are a bit similar to this.

> For instance, I want to distribute 100 files semi-evenly to 32
> machines named node00 - node31.  Perhaps it would look something like this,
> using "{p}" as the replacement string for the process number:
>
> $ parallel -P32 scp {} node{p}:/data ::: *.gz
>
> That way, 3 or 4 files go to each of node00, node 01...

So a short hand for something like:

parallel printf '%02d\\t%s\\n' \$\(\(\$PARALLEL_SEQ%32\)\) ::: *gz |
parallel --colsep '\t' -P32 scp {2} node{1}:/data

but with the added benefit that if one of the files is big one node
may only get that single file.

I understand that on the local computer you want {p} to be the job
slot number with 0's prepended.

Please describe what {p} will be when some of the jobs are run on
remote hosts (-S :,server1).

Please describe what {p} will be when the argument for -P is a
filename (and thus can be changed during the run). How many 0's should
be prepended if it is changed from -P9 to -P10?

Please describe what {p} will be when the command is retried (--retries > 1).

> Or I want to concatenate 100 files semi-evenly into 16 pieces:
>
> $ parallel -P16 "cat {} >> output-file{p}.txt" ::: ~/files/*.txt

So a short hand something like:

parallel printf '%02d\\t%s\\n' \$\(\(\$PARALLEL_SEQ%16\)\) :::
~/files/*.txt | parallel --colsep '\t' -P16 "cat {2} >>
output-file{1}.txt"

but with the added benefit that if one of the files is big then one of
the cat's may only get that single file.

> gives 16 files named output-file00.txt ... output-file15.txt, each
> consisting of 6 or 7 of the input files.
>
> It's also useful to have a replacement string for the total number of
> processes (the -P given, basically), maybe {P}, and the number of the input
> item, maybe {n}.

{n} seems to be what $PARALLEL_SEQ is today.

> To clarify, it would work something like this:
>
> $ seq 0 10 | parallel -P3 echo {n}/{p}/{P}
> 0/0/3 0
> 1/1/3 1
> 2/2/3 2
> 3/1/3 3
> 4/0/3 4
> 5/2/3 5
> 6/2/3 6
> 7/1/3 7
> 8/0/3 8
> 9/0/3 9
> 10/1/3 10
>
> Though of course the order of the lines may be different.
>
> Is there a better way to do this already, or is this something that would be
> straightforward to add?

I am reluctant to add more magic {}'s as they may give nasty surprises
to people that uses {p} for something else. You could access a
variable like: ${p} and thus get a nasty surprise.

That being said the idea may be of value, and we can probably come up
with a {} construct that would not exist in normal bash or perl ({=}, {%} and
{#} spring to mind).

It is definitely not a trivially simple task implementing it. I would
reckon it is a medium sized task.

I am not convinced this is worth the effort and the added complexity,
though: From my experience the examples you have shown so far is
something very few people need. Off the top of my head I cannot
remember that I have had a use for that even a single time in the past
20 years of very different jobs. But if more users find it a good idea
or come up with more generally useful examples, then I might be
convinced.

Also have a look at https://savannah.gnu.org/bugs/?31678. It is a
feature that would solve your two examples provided that the number of
arguments fit a single line (because scp and cat can take more than
one argument).

> $ parallel -P32 scp {} node{p}:/data ::: *.gz

Would become:

$ parallel -P32 -X scp {} node\$PARALLEL_SEQ:/data ::: *.gz

> $ parallel -P16 cat {} >> output-file{p}.txt" ::: ~/files/*.txt

Would become:

$ parallel -P16 -X cat {} ">>" output-file\$PARALLEL_SEQ.txt ::: ~/files/*.txt

I would be much easier convinced to implement that.


/Ole



reply via email to

[Prev in Thread] Current Thread [Next in Thread]