gwl-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gwl-devel] merging “processes” and “restrictions”


From: zimoun
Subject: Re: [gwl-devel] merging “processes” and “restrictions”
Date: Tue, 22 Jan 2019 09:49:37 +0100

Hi Ricardo,

On Mon, 21 Jan 2019 at 23:51, Ricardo Wurmus <address@hidden> wrote:

> > Is it possible to turn off the test (make check) when building hello ?
>
> This is not supported in Guix, so there’s nothing I can do in the GWL.

Ok.

>
> > Cosmetic comment. :-)
> > About the `A -> B' which means A depends on B.
> > To me, the arrow is counterintuitive, notationally speaking. :-)
> > Because the data flow is going from B to A.
> > Even if this notation is usual when speaking of dependencies and graph.
>
> The arrow is read as “depends on”.  If you want to we could just as well
> support an arrow in the opposite direction, as it really has no
> meaning.  But I think that would be more confusing.

>From the Snakemake doc about graph and DAG [1], they choose: ""A -> B"
means B depends on A because it expresses how the data flow, i.e. the
output of A is the input of B.
It is the same for CWL [2].
I agree that it is not the usual way to express the dependencies. (e.g. UML).
If we choose the snakemake/cwl meaning for `->' then it will not be
consistent with the meaning of the arrow of `guix graph'.

>From my perspective, it is more intuitive the snakemake/cwl way. But
what is intuitive for someone is not for else one. :-)


If we speak about cosmetic, and let the example fom the graph [3]. I
find more readable:

1.
(graph
   (samsort samindex -> bcftools_call))

than
2.
 (graph
   (bcftools_call <- samsort samindex)

than
3.
 (graph
   (bcftools_call -> samsort samindex)


I do not know, I feel like cutting an hair in four pieces. :-)
(french expression :-)

[1] 
https://snakemake.readthedocs.io/en/stable/tutorial/basics.html#step-4-indexing-read-alignments-and-visualizing-the-dag-of-jobs
[2] 
https://view.commonwl.org/workflows/github.com/common-workflow-language/cwltool/blob/master/cwltool/schemas/v1.0/v1.0/step-valuefrom3-wf.cwl
[3] https://snakemake.readthedocs.io/en/stable/tutorial/basics.html#id1


> > From a simple user perspective, I find more readable the current
> > version with `graph'. Because I am able to see the flow even if I do
> > not know about the processes fry, bake and take.
>
> Right.  I also prefer the explicit “graph” syntax.  With “link”
> (formerly “connect”) it’s *possible* but not requiried to automatically
> link up all of the processes.  I suspect that this is more in line with
> what Snakemake users might expect.

Instead of `link', why not `auto-link'?


> > From my point of view, the `let' part fixes the entry point or some
> > specific location of outputs (for debugging purpose?).
> >
> > (define (eat input output)
> >  (process
> >   (name "Eat")
> >   (data-inputs input)
> >   (outputs output)))
> >
> > (define (cook input output)
> >  (process
> >   (name "Cook")
> >   (data-inputs input)
> >   (outputs output)))
> >
> > (define (take input output)
> >  (process
> >   (name "Take")
> >   (data-inputs input)
> >   (outputs output)))
> >
> > (workflow
> >   (processes
> >     (let ((take-choc (inputs take "/path/to/chocolate"))
> >           (take-cake (outputs take "/path/to/store/cake"))
> >           (miam (outputs eat "/path/to/my/mouth")))
> >     (graph
> >        (cook -> take-choc)
> >        (take-cake -> cook)
> >        (miam -> take-cake)))
> >
> > If the inputs/outputs are not specified in the `let' part, then they
> > are automatically stored somewhere in /tmp/ or elsewhere and then
> > (optionally) removed when all the workflow is done.
> >
> > I imagine `inputs'/`outputs' returning a curryfied process, somehow.
> >
> > And similarly about options, e.g,
> >  (define* (cook input output #:optional temp-woven)
> >      blah)
> >
> >
> > Does it make sense ?
>
> This seems to be from the perspective of data flow as you indicated
> earlier.  I’m not sure I fully understand it, but I give it a try.  (To
> me it seems similar to continuations.)

I am not clear with continuations but yes it seems similar once said. :-)


Thank you to take from your time and give it a try.


> Expressed as a data flow the workflow looks like this:
>
>   (take "chocolate") => cook => (take "cake") => miam
>
> At each step we generate a value that can be processed by the next
> step.  This looks suspiciously like an Arrow[1].

You better expressed my thoughts. :-)

>
> [1]: https://www.haskell.org/arrows/syntax.html
>
>   (push "chocolate"
>     (>>> take cook take miam))
>
> i.e. we push the value “chocolate” into a chain where a procedure’s
> outputs are connected to the next procedure’s inputs.
>
> The example makes it a bit hard to think about this clearly — what about
> the second invocation of “take”?  What about multiple inputs?  Isn’t
> this just function composition and application?

To me, multiple inputs or outputs should be an issue when composing, I agree.

Say that `take' takes 2 inputs, say `a' and `b'. We could impose to
pack them as a list (a b) and the process' writer should have to
unpack them.
Now say that `cook` returns 3 outputs, say `x' and `y' and `z'. They
are also packed as a list.
However how to encode the facts that `a' corresponds to `z', and `b' to `y'.

You need somehow a dummy process that unpack and repack, that somehow
agrees the "type" of each process.

(push
 (>>> take cook dumb take miam))

(define (dumb input output)
  (data-inputs ((u (cadr input)
                        (v (caadr input)))
  (outputs (v u)))


I do not know if it makes sense, if it is usable and better.
I just find that more "functional".


>
> x >– A –> B —> C –> E –> F
>      |    `––> D ––––––/
>      `–––––––/
>
> x is the input to the data flow.
>
>     (flow (x)
>       (a <- (A x))     ; apply A and bind output to “a”
>       (b <- (B a))     ; apply B and bind output to “b”
>       (e <- (>>> C E)) ; apply C and then E, bind the output to “e”
>       (d <- (D a b))   ; apply D and bind the output to “d”
>       (-> (F e d)))    ; return F applied to “e” and “d”
>
> “flow” would somehow figure out in what order to run things.  I feel
> that there should be a better way to express this, but I haven’t found
> one.

Yes. This is already nice! :-)


And the user does not have to manage by hand the names of all the outputs.
In other word, say the user has already computer your workflow with
`x' set to /path/to/my-file.
Then this user writes another flow:
 (flow (x)
  (z <- (>>> A B x))
  (-> (G z)))
When apply this second flow to /path/to/my-file, then the result `z'
is already in the CAS (see `b') and only (G z) is computed.
The dream should be:
 (flow (x)
   (-> ((>>> A B G) x)))
And to automatically detect that the composition `B . A' is already
computed for the value /path/to/my-file.
Well, I am dreaming... :-)


All the best,
simon



reply via email to

[Prev in Thread] Current Thread [Next in Thread]