[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[task #15737] slurm - openmpi - (PMIx+libevent+hwloc)
From: |
Boud Roukema |
Subject: |
[task #15737] slurm - openmpi - (PMIx+libevent+hwloc) |
Date: |
Wed, 29 Jul 2020 13:21:41 -0400 (EDT) |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0 |
URL:
<https://savannah.nongnu.org/task/?15737>
Summary: slurm - openmpi - (PMIx+libevent+hwloc)
Project: Reproducible paper template
Submitted by: boud
Submitted on: Wed 29 Jul 2020 05:21:39 PM UTC
Should Start On: Wed 29 Jul 2020 12:00:00 AM UTC
Should be Finished on: Wed 29 Jul 2020 12:00:00 AM UTC
Category: None
Priority: 5 - Normal
Status: None
Privacy: Public
Percent Complete: 0%
Assigned to: None
Open/Closed: Open
Discussion Lock: Any
Effort: 0.00
_______________________________________________________
Details:
Parallel processing using possibly non-shared memory, using MPI (message
passing interface, a standard, not any particular software), is presently
allowed for in Maneage using openmpi. How should we compile openmpi for
reproducibility?
In practice, openmpi is normally to be used on a cluster or supercomputer on
which jobs are submitted to and queued by and run (or rejected) by a
free-software (hopefully) job/user manager such as Slurm:
https://slurm.schedmd.com/ .
The computer on which a job is run is (in general) not the one from which a
batch job is submitted to slurm, e.g. with 'srun'.
So roughly speaking, as I understand it:
* user uses srun or sbatch to submit a script _X.sh_ to the slurm daemon on
the frontend H;
* slurm queues the request, and after some time may choose one or more
computers K and try to run _X.sh_ under the user's identity on those computers
K;
* the computers K each run _X.sh_, which can include a maneage package that
compiles and runs program P, which uses openmpi to ask the host computer and
Slurm which cpus/cores/threads it is allowed to use;
* the interaction between _X.sh_ on K -> openmpi on K (precompiled library) ->
host K + Slurm on H (and in some sense on K) is done through _PMIx_ (pmi or
pmi2); _libevent_; and _hwloc_ .
* MPI means that data (arrays of bytes :)) can be sent/received among the
computers K.
So the question is: for reproducibility, how much of the chain: _openmpi ->
(pmi + libevent + hwloc)_ do we want compiled internally within Maneage, and
how much should it be based on _autotools_ type automatic searching on the
machine for the preferred default libraries?
There is no point trying to include _slurm_ in Maneage, because the whole
point is that the sysadmins managing a cluster use slurm to automatically
manage a whole bunch of users - it's system-level software that the user's
script has to interact with.
official guide: https://slurm.schedmd.com/mpi_guide.html#open_mpi
The official guide doesn't give much in terms of practical, up-to-date
experience. Some URLs that seem useful:
Some URLs that seem useful:
https://bugs.schedmd.com/show_bug.cgi?id=5323
https://github.com/open-mpi/ompi/issues/5871
I'm trying some experiments, but any prior experience with this would help
speed things up. :)
_______________________________________________________
Reply to this item at:
<https://savannah.nongnu.org/task/?15737>
_______________________________________________
Message sent via Savannah
https://savannah.nongnu.org/
- [task #15737] slurm - openmpi - (PMIx+libevent+hwloc),
Boud Roukema <=