gwl-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Getting started with GWL 0.3.0


From: Roel Janssen
Subject: Re: Getting started with GWL 0.3.0
Date: Tue, 23 Mar 2021 21:30:14 +0100
User-agent: Evolution 3.38.4 (3.38.4-1.fc33)

On Tue, 2021-03-23 at 21:14 +0100, Ricardo Wurmus wrote:
> 
> Roel Janssen <roel@gnu.org> writes:
> 
> > On Tue, 2021-03-23 at 18:34 +0100, Ricardo Wurmus wrote:
> > > 
> > > Before you get too enthusiastic about the GWL, though, I’d like to
> > > note
> > > that 0.3.0 has a few known bugs that are already fixed in the
> > > repository.  I’ve been putting off making a new release until
> > > either
> > > Guile-AWS or Guile-DRMAA are ready and usable with the GWL.
> > 
> > Is there a feature-branch to try out GWL with Guile-DRMAA? :)
> 
> Unfortunately not yet.
> 
> I haven’t been 100% successful with the only DRMAA-enabled cluster that
> I have access to, and it turns out that it’s not as simple as SGE’s
> “hold_jid”.
> 
> It’s no longer “fire and forget”, which is a bit sad, but that’s how
> DRMAA works.  We need a run-time component that keeps track of
> submitted
> jobs and their status and actively starts held jobs when the
> prerequisites have finished.

That's unfortunate, but I believe having a daemon that keeps track of
the workflow opens possibilities for "cloud" "orchestration".

> It’s not clear to me if and how we should persist workflow state.  The
> GWL will submit all jobs to the scheduler in a held state and then
> change their status when its their turn.  I wonder if and how we should
> handle the case where the GWL runtime monitor dies and is restarted.
> The easiest way is to simply kill all queued up jobs, but I don’t know
> if there’s a better approach.
> 
> Ideas?

I find killing/removing queued jobs upon exiting the runtime monitor a
good idea!

Maybe not suitable anymore, but I wrote a "qsub" command that
translates to "squeue" here:
https://github.com/roelj/qsub-slurm

Could we use the same approach?  It works because jobs are submitted in
order.  The look-up mechanism can be found here:
https://github.com/roelj/qsub-slurm/blob/master/qsub.in#L233-L253


I have access to a SLURM cluster (I don't know which version of DRMAA
it supports), but I can test it.

Kind regards,
Roel Janssen





reply via email to

[Prev in Thread] Current Thread [Next in Thread]