parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Query regarding stackoverflow question (Run million of list in PBS with


From: Rajiv Gandhi Govindaraj
Subject: Query regarding stackoverflow question (Run million of list in PBS with parallel tool)
Date: Fri, 4 Oct 2019 02:09:54 +0530

Dear Ole Tange,


Thanks for this great tool and for helping me out on this.

We, Computational chemist widely run long/large scale calculations on biological data and GNU parallel is very helpful in achieving it.


As per you suggestion I've introduced a simple function and tested in my local machine which has a 4 cpu for a list contains 10,000: 

top shows 4 running while using --pipepart --block -10k (option 2) but shows only 1-2 running in others ( option 1 & 3):

Here are the completed time in using/not-using function in script:


1. Introducing new function (example_10k.lst contains only 10,000)

job_script.sh

#!/bin/bash

dowork() {

export WDIR=/shared/TF_data/work_dir

cd $WDIR;  

parallel --wd $WDIR sh run_script.sh {}

}

export -f dowork


cat example_10k.lst | dowork

Completes the job in 

real 4m4.599s

user 2m29.206s

sys 0m30.267s

2. Intoducing the new function, --pipepart --block -10k instead of cat in job_script.sh


parallel -a example_10k.lst --pipepart --block -10 dowork


Completes the job in

real 4m44.067s

user 2m58.884s

sys 0m46.153s


3. With No new functions


#!/bin/bash

export WDIR=/shared/TF_data/work_dir

cd $WDIR;  

cat example_10k.lst | parallel --wd $WDIR sh run_script.sh {}

Completes the job in

real 4m5.139s

user 2m28.339s

sys 0m30.723s



Since i tested in 4 cpu local machine, wonder whether testing on more cpu (72 x 5 nodes) or full list (200k) can get a better cpu utilzation ?

Any suggestion is much appreciated. 


Best,

Rajiv



reply via email to

[Prev in Thread] Current Thread [Next in Thread]