[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Getting started with GWL 0.3.0
From: |
Roel Janssen |
Subject: |
Re: Getting started with GWL 0.3.0 |
Date: |
Tue, 23 Mar 2021 21:30:14 +0100 |
User-agent: |
Evolution 3.38.4 (3.38.4-1.fc33) |
On Tue, 2021-03-23 at 21:14 +0100, Ricardo Wurmus wrote:
>
> Roel Janssen <roel@gnu.org> writes:
>
> > On Tue, 2021-03-23 at 18:34 +0100, Ricardo Wurmus wrote:
> > >
> > > Before you get too enthusiastic about the GWL, though, I’d like to
> > > note
> > > that 0.3.0 has a few known bugs that are already fixed in the
> > > repository. I’ve been putting off making a new release until
> > > either
> > > Guile-AWS or Guile-DRMAA are ready and usable with the GWL.
> >
> > Is there a feature-branch to try out GWL with Guile-DRMAA? :)
>
> Unfortunately not yet.
>
> I haven’t been 100% successful with the only DRMAA-enabled cluster that
> I have access to, and it turns out that it’s not as simple as SGE’s
> “hold_jid”.
>
> It’s no longer “fire and forget”, which is a bit sad, but that’s how
> DRMAA works. We need a run-time component that keeps track of
> submitted
> jobs and their status and actively starts held jobs when the
> prerequisites have finished.
That's unfortunate, but I believe having a daemon that keeps track of
the workflow opens possibilities for "cloud" "orchestration".
> It’s not clear to me if and how we should persist workflow state. The
> GWL will submit all jobs to the scheduler in a held state and then
> change their status when its their turn. I wonder if and how we should
> handle the case where the GWL runtime monitor dies and is restarted.
> The easiest way is to simply kill all queued up jobs, but I don’t know
> if there’s a better approach.
>
> Ideas?
I find killing/removing queued jobs upon exiting the runtime monitor a
good idea!
Maybe not suitable anymore, but I wrote a "qsub" command that
translates to "squeue" here:
https://github.com/roelj/qsub-slurm
Could we use the same approach? It works because jobs are submitted in
order. The look-up mechanism can be found here:
https://github.com/roelj/qsub-slurm/blob/master/qsub.in#L233-L253
I have access to a SLURM cluster (I don't know which version of DRMAA
it supports), but I can test it.
Kind regards,
Roel Janssen
Re: Getting started with GWL 0.3.0, Konrad Hinsen, 2021/03/23
Re: Getting started with GWL 0.3.0, Konrad Hinsen, 2021/03/24