gwl-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Managing data files in workflows


From: Ricardo Wurmus
Subject: Re: Managing data files in workflows
Date: Fri, 26 Mar 2021 14:13:16 +0100
User-agent: mu4e 1.4.14; emacs 27.1

Konrad Hinsen <konrad.hinsen@fastmail.net> writes:

>> This works for me correctly:
>
> Thanks for looking into this! For me, your change makes no difference.
> Nor should it, because in my setup the "data" directory already exists.
> I still get an error message about the already existing file.
>
> Maybe it's time to switch to the development version of GWL!

Hmm, I don’t see any commits since 0.3.0 that would affect the cache
implementation.  GWL computes cache hashes for all processes and the
processes they depend on.  In your case it’s trivial: there’s just one
process.  The process definition is hashed and looked up in the cache
to see if there is any output for the given process hash.

In my test case this file exists:

    
/tmp/gwl/lf6uca7zcyyldkcrxn3zwc275ax3ip676aqgjo75ybwojtl4emoq/data/weekly-incidence.csv

/tmp/gwl is the cache prefix, and the hash corresponds to the process.
Since data/weekly-incidence.csv exists and that’s the only declared
output, GWL decides not compute the output again.

At least that happens in my case.  I wonder why it doesn’t work in your
case.

> However, what I had in mind with my question is the management of
> intermediate results in my workflow, especially in its development
> phase. If I change my workflow file, or a script that it calls,
> I'd want only the affected steps to be recomputed. That's not much
> of an issue for my current test case, but I have bigger dreams for
> the future ;-)

Yes, that’s the way it’s supposed to work already.  GWL computes the
hashes of each chain of processes, which includes the generated process
script, its inputs, and the hashes of all processes that lead up to this
process.  Any change in the chain will lead to a new hash and thus a
cache miss, leading GWL to recompute.

-- 
Ricardo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]