[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [gwl-devel] Next steps for the GWL
Re: [gwl-devel] Next steps for the GWL
Thu, 6 Jun 2019 08:23:32 -0500
On Thu, Jun 06, 2019 at 02:19:04PM +0200, Ricardo Wurmus wrote:
> Hi simon,
> > (+ Pjotr because I am sure he has an interesting opinion but not sure
> > he closely reads this list ;-)
I read it :)
> > On Mon, 3 Jun 2019 at 18:18, Ricardo Wurmus
> > <address@hidden> wrote:
> >> > - what about a bridge with CWL?
> >> I’m open to this idea, but it would need to be well-defined. What does
> >> it really mean? Generating CWL files from GWL workflows? That really
> >> shouldn’t be too hard. Anything else, however, is hard for me to
> >> imagine.
> > Well, I point out previous threads about this topic:
> > https://lists.gnu.org/archive/html/guix-devel/2018-01/msg00428.html
> > https://lists.gnu.org/archive/html/gwl-devel/2019-02/msg00019.html
> > 1-
> > Generating CWL from GWL should be nice. It should ease the use of
> > already in-place platform and tools (AWS, etc.)
> Generating CWL from GWL should be easy, but it’s also not all that
> useful. The GWL takes care of software deployment, so not only should
> we generate CWL files but also generate (and upload?) Docker images and
> make the CWL file reference them.
> The tooling for CWL… seems a little less substantial and focused than it
> first appears. The cwltool can only run CWL workflows locally — no
> DRMAA, no AWS. All the other runners that are listed on the CWL website
> are either very limited or very large environments where CWL execution
> is not necessarily the primary purpose (cf Galaxy or Arvados).
> Still, I think it’s the most meanigful connection the GWL can have with
> the CWL: using the GWL as a high-level representation which “compiles”
> down to a lower-level representation of CWL + Docker images when needed.
> > 2-
> > Use CWL as a process. A lot of work have been done by Pjotr and
> > reported here 
> > 
> > https://guix-hpc.bordeaux.inria.fr/blog/2019/01/creating-a-reproducible-workflow-with-cwl/
> Yes, this works, of course, but that’s a level of integration that’s
> extremely limited, in my opinion. Using Guix with the CWL is fine as
> the blog post demonstrates, but there is very little to be gained and
> much to be lost when embedding CWL in a GWL workflow. The only thing
> this enables is reusing existing CWL workflows as a GWL “process”.
> There is no meaningful integration – the embedded CWL workflow is a
> second-class citizen that cannot benefit from any of the GWL features.
> If the CWL workflow is connected to the GWL via cwltool then the only
> way to run the workflow on a DRMAA-supported cluster or a bunch of
> SSH-connected servers, or AWS EC2 instances is to wrap it up in a GWL
> context. The GWL treats the process as its smallest unit of
> organisation, so a CWL workflow that’s run as a GWL process cannot
> really be scaled. If the user has a different CWL execution environment
> (such as an Arvados installation), the CWL workflow embedded in the GWL
> will not be able to make use of it. It would forever be tied to the
> particular version of cwltool in Guix.
> I’d rather not advocate this use of the CWL in the GWL. It might sound
> good (“The GWL is compatible with the CWL!”), but ultimately it’s a
> really awkward connection that is bound to lead to a great deal of
> Does this make sense?
Yes. Personally I also think the CWL is flawed. It overcomplicates
things and the reference implementation is pretty crappy. If we get
GWL to work in my environment I would think it a breath of fresh air.
Not to say that the CWL does not have some bad ideas (triple
negative). You can read my blog for that.
> I don’t want to be dismissive. It would be great if we could come
> with something that’s mutually beneficial for CWL users and GWL users
> alike, but I feel that our options are very limited. I’m still open to
> ideas and use-case scenarios.
We can probably just mix the two. I mean the main benefit of the CWL
is *sharing* workflows that have been described by others. That is the
point of the CWL and even at that it does not prove really great
(after all this time how much is shared?).
Since CWL and GWL can use the same file system and job submission
system I think it is pretty OK for GWL to ignore the CWL and either
send data from one to the other or execute CWL pipelines from GWL.
Both possible without much work.
Re: [gwl-devel] Next steps for the GWL, Ricardo Wurmus, 2019/06/12