parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bugid: loadavg_invalid_content


From: Thomas Danckaert
Subject: bugid: loadavg_invalid_content
Date: Tue, 13 Jun 2017 11:49:56 +0200 (CEST)

Dear parallel developers,

on our hpc cluster, when jobs on different nodes use parallel simultaneously, they abort with the following error message:

parallel: This should not happen. You have found a bug.
Please contact <parallel@gnu.org> and include:
* The version number: 20170522
* The bugid: loadavg_invalid_content: /home/thomasd/.parallel/tmp/sshlogin/:/loadavg

This is the command being run
parallel --tmpdir /dev/shm/pbs.2448832.hpc-pbs --no-notice --load 100% --delay 1 /home/thomasd/create_lut.sh -m 0 -l 4 -p 0 -s 2 -v {1} -r {2} /home/thomasd/grid_hpc.cfg ::: {0..12} ::: {0..8}

When only one parallel process is running at a time, it works fine. I think the parallel jobs on different nodes, which share the same home directory, are accessing the same “loadavg” file.

As a workaround I can pass each parallel job a different $PARALLEL_HOME environment variable, and this seems to avoid the problem. When I look in those directories, they each contain a directory named after the hostname of the node used for the job (i.e. tmp/sshlogin/hpc-nXYZ), and a “:” directory (tmp/sshlogin/:).

Sincerely,

Thomas Danckaert

reply via email to

[Prev in Thread] Current Thread [Next in Thread]