[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [gwl-devel] merging “processes” and “restrictions”
From: |
zimoun |
Subject: |
Re: [gwl-devel] merging “processes” and “restrictions” |
Date: |
Tue, 22 Jan 2019 09:49:37 +0100 |
Hi Ricardo,
On Mon, 21 Jan 2019 at 23:51, Ricardo Wurmus <address@hidden> wrote:
> > Is it possible to turn off the test (make check) when building hello ?
>
> This is not supported in Guix, so there’s nothing I can do in the GWL.
Ok.
>
> > Cosmetic comment. :-)
> > About the `A -> B' which means A depends on B.
> > To me, the arrow is counterintuitive, notationally speaking. :-)
> > Because the data flow is going from B to A.
> > Even if this notation is usual when speaking of dependencies and graph.
>
> The arrow is read as “depends on”. If you want to we could just as well
> support an arrow in the opposite direction, as it really has no
> meaning. But I think that would be more confusing.
>From the Snakemake doc about graph and DAG [1], they choose: ""A -> B"
means B depends on A because it expresses how the data flow, i.e. the
output of A is the input of B.
It is the same for CWL [2].
I agree that it is not the usual way to express the dependencies. (e.g. UML).
If we choose the snakemake/cwl meaning for `->' then it will not be
consistent with the meaning of the arrow of `guix graph'.
>From my perspective, it is more intuitive the snakemake/cwl way. But
what is intuitive for someone is not for else one. :-)
If we speak about cosmetic, and let the example fom the graph [3]. I
find more readable:
1.
(graph
(samsort samindex -> bcftools_call))
than
2.
(graph
(bcftools_call <- samsort samindex)
than
3.
(graph
(bcftools_call -> samsort samindex)
I do not know, I feel like cutting an hair in four pieces. :-)
(french expression :-)
[1]
https://snakemake.readthedocs.io/en/stable/tutorial/basics.html#step-4-indexing-read-alignments-and-visualizing-the-dag-of-jobs
[2]
https://view.commonwl.org/workflows/github.com/common-workflow-language/cwltool/blob/master/cwltool/schemas/v1.0/v1.0/step-valuefrom3-wf.cwl
[3] https://snakemake.readthedocs.io/en/stable/tutorial/basics.html#id1
> > From a simple user perspective, I find more readable the current
> > version with `graph'. Because I am able to see the flow even if I do
> > not know about the processes fry, bake and take.
>
> Right. I also prefer the explicit “graph” syntax. With “link”
> (formerly “connect”) it’s *possible* but not requiried to automatically
> link up all of the processes. I suspect that this is more in line with
> what Snakemake users might expect.
Instead of `link', why not `auto-link'?
> > From my point of view, the `let' part fixes the entry point or some
> > specific location of outputs (for debugging purpose?).
> >
> > (define (eat input output)
> > (process
> > (name "Eat")
> > (data-inputs input)
> > (outputs output)))
> >
> > (define (cook input output)
> > (process
> > (name "Cook")
> > (data-inputs input)
> > (outputs output)))
> >
> > (define (take input output)
> > (process
> > (name "Take")
> > (data-inputs input)
> > (outputs output)))
> >
> > (workflow
> > (processes
> > (let ((take-choc (inputs take "/path/to/chocolate"))
> > (take-cake (outputs take "/path/to/store/cake"))
> > (miam (outputs eat "/path/to/my/mouth")))
> > (graph
> > (cook -> take-choc)
> > (take-cake -> cook)
> > (miam -> take-cake)))
> >
> > If the inputs/outputs are not specified in the `let' part, then they
> > are automatically stored somewhere in /tmp/ or elsewhere and then
> > (optionally) removed when all the workflow is done.
> >
> > I imagine `inputs'/`outputs' returning a curryfied process, somehow.
> >
> > And similarly about options, e.g,
> > (define* (cook input output #:optional temp-woven)
> > blah)
> >
> >
> > Does it make sense ?
>
> This seems to be from the perspective of data flow as you indicated
> earlier. I’m not sure I fully understand it, but I give it a try. (To
> me it seems similar to continuations.)
I am not clear with continuations but yes it seems similar once said. :-)
Thank you to take from your time and give it a try.
> Expressed as a data flow the workflow looks like this:
>
> (take "chocolate") => cook => (take "cake") => miam
>
> At each step we generate a value that can be processed by the next
> step. This looks suspiciously like an Arrow[1].
You better expressed my thoughts. :-)
>
> [1]: https://www.haskell.org/arrows/syntax.html
>
> (push "chocolate"
> (>>> take cook take miam))
>
> i.e. we push the value “chocolate” into a chain where a procedure’s
> outputs are connected to the next procedure’s inputs.
>
> The example makes it a bit hard to think about this clearly — what about
> the second invocation of “take”? What about multiple inputs? Isn’t
> this just function composition and application?
To me, multiple inputs or outputs should be an issue when composing, I agree.
Say that `take' takes 2 inputs, say `a' and `b'. We could impose to
pack them as a list (a b) and the process' writer should have to
unpack them.
Now say that `cook` returns 3 outputs, say `x' and `y' and `z'. They
are also packed as a list.
However how to encode the facts that `a' corresponds to `z', and `b' to `y'.
You need somehow a dummy process that unpack and repack, that somehow
agrees the "type" of each process.
(push
(>>> take cook dumb take miam))
(define (dumb input output)
(data-inputs ((u (cadr input)
(v (caadr input)))
(outputs (v u)))
I do not know if it makes sense, if it is usable and better.
I just find that more "functional".
>
> x >– A –> B —> C –> E –> F
> | `––> D ––––––/
> `–––––––/
>
> x is the input to the data flow.
>
> (flow (x)
> (a <- (A x)) ; apply A and bind output to “a”
> (b <- (B a)) ; apply B and bind output to “b”
> (e <- (>>> C E)) ; apply C and then E, bind the output to “e”
> (d <- (D a b)) ; apply D and bind the output to “d”
> (-> (F e d))) ; return F applied to “e” and “d”
>
> “flow” would somehow figure out in what order to run things. I feel
> that there should be a better way to express this, but I haven’t found
> one.
Yes. This is already nice! :-)
And the user does not have to manage by hand the names of all the outputs.
In other word, say the user has already computer your workflow with
`x' set to /path/to/my-file.
Then this user writes another flow:
(flow (x)
(z <- (>>> A B x))
(-> (G z)))
When apply this second flow to /path/to/my-file, then the result `z'
is already in the CAS (see `b') and only (G z) is computed.
The dream should be:
(flow (x)
(-> ((>>> A B G) x)))
And to automatically detect that the composition `B . A' is already
computed for the value /path/to/my-file.
Well, I am dreaming... :-)
All the best,
simon
- [gwl-devel] merging “processes” and “restrictions”, Ricardo Wurmus, 2019/01/19
- Re: [gwl-devel] merging “processes” and “restrictions”, Ricardo Wurmus, 2019/01/21
- Re: [gwl-devel] merging “processes” and “restrictions”, zimoun, 2019/01/21
- Re: [gwl-devel] merging “processes” and “restrictions”, Ricardo Wurmus, 2019/01/21
- Re: [gwl-devel] merging “processes” and “restrictions”, zimoun, 2019/01/21
- Re: [gwl-devel] merging “processes” and “restrictions”, Ricardo Wurmus, 2019/01/21
- Re: [gwl-devel] merging “processes” and “restrictions”, zimoun, 2019/01/21
- Re: [gwl-devel] merging “processes” and “restrictions”, Ricardo Wurmus, 2019/01/26