guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Google Summer of Code 2023 Inquiry


From: Kyle
Subject: Re: Google Summer of Code 2023 Inquiry
Date: Fri, 31 Mar 2023 00:52:48 +0000

As a statistician who always wants to get the most information for the least effort, I am particularly interested in being able to reprioritize workflow jobs interactively within the equivalent portions of the topological sort. I thought perhaps this would be possible with GWL if it could talk to SLURM with DRMAA version 2 (https://en.wikipedia.org/wiki/DRMAA). This would also be more readily useful to researchers if Guix had a conveniently available slurm service which worked out of the box even on a single machine.

Stepping back, there might be a more ambitious question hidden in there in terms of how to handle indeterminism in a deterministic workflow manager. Without that external information the problem just involves choosing your random seeds up front. However,  I would prefer to write a procedure which is constantly reprioritizing labeled sub jobs within their associated containers either until I hit a resource limit or I have achieved certain target statistical diagnostics. Perhaps I would want GWL to tell me how to replay my build after the fact so I can make that reproducible even though I didn't know what I needed to focus my computations on up front and let the computer do that. Making that sort of thing possible might be a longer term effort, but working out what's needed for initial steps might be a fun project.

On March 30, 2023 7:27:37 PM EDT, Spencer Skylar Chan <schan12@terpmail.umd.edu> wrote:
Hi Ricardo,

On 3/23/23 03:58, Ricardo Wurmus wrote:
Hi,

Spencer Skylar Chan <schan12@terpmail.umd.edu> writes:

One approach could be to add CWL import/export capabilities to
GWL. Then Snakemake/GWL conversion would be a 2 step process, using
CWL as an intermediate step:

1. Snakemake -> CWL
2. CWL -> GWL

This seems doable.

Great! I've been reading the chapter in Evolutionary Genomics on different scalable workflows to understand this process better.

However, CWL is not as expressive as Snakemake. There may be some
details that are lost from Snakemake workflows.

So a 1-step Snakemake/GWL transpiler could be interesting, as both
Snakemake/GWL use a domain-specific language inside a general purpose
language (Python/Guile respectively). There may be a possibility to
achieve more "accurate" translations between workflows.

Compared to the previous approach this seems vastly more complex. It’s
one thing to *execute* Snakemake code without running it through Python,
but quite a bit more challenging to transpile Python to Scheme.

Personally, I wouldn’t know where to start. Do you have an idea
already?


Actually I was hoping you might have some ideas :)
I do think that if the execution of the pipeline is more important than its representation (Snakemake or otherwise), then it would make more sense to focus efforts on increasing GWL's capabilities.

Thanks,
Skylar

reply via email to

[Prev in Thread] Current Thread [Next in Thread]