Dear All,
I am trying to implement gnu parallel on a cluster with several nodes and each node has up to 12 cores.
Here "file_name" name contains two parameters that is fed to my script "program.sh". My script needs two parameters to run.
While "login_server_names" contains the address to each of the nodes in the cluster.
Example of what my "file_name" comprise:
1 1
1 2
1 3
..........
1 12
Example of what (my_login_server_name) file comprise. For a single
node case, my "login_server_name" will have a similar address:
node_1_server_address
node_1_server_address
-----
node_1_server_address
or equivalently
12/node_1_server_address
Here is my understanding of the gnu parallel implementations on a single node:
cat $file_name | parallel -u -j 12 --sshloginfile $login_server_name --colsep ' ' program.sh {1} {2}
My idea is for the above command is to distribute each of my jobs to each of the cores of a single node. If my implementation is correct, this is what I expect the gnu parallel to do:
At core 1 of node 1:
"program.sh 1 1" ought to run
At core 2 of node 1:
"program.sh 1 2" ought to run
------------------------
so on
--------------------------
At core 12 of node 1:
"program.sh 1 12" ought to
run
Please confirm
All the best,
Yacob
PS. For several nodes, the above command will also work accordingly.