gwl-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Managing data files in workflows


From: Konrad Hinsen
Subject: Managing data files in workflows
Date: Thu, 25 Mar 2021 10:57:27 +0100

Hi everyone,

Coming from make-like workflow systems, I wonder how data files are best
managed in GWL workflow. GWL is clearly less file-centric than make
(which is a Good Thing in my opinion), but at a first reading of the
manual, it doesn't seem to care about files at all, except for
auto-connect.

A simple example:

==================================================
process download
  packages "wget"
  outputs
    file "data/weekly-incidence.csv"
  # { wget -O {{outputs}} http://www.sentiweb.fr/datasets/incidence-PAY-3.csv }

workflow influenza-incidence
  processes download
==================================================

This works fine the first time, but the second time it fails because the
output file of the process already exists. This doesn't look very
useful. The two behaviors I do see as potentially useful are

 1) Always replace the file.
 2) Don't run the process if the output file already exists
    (as make would do by default)

I can handle this in my bash code of course, but that becomes lengthy
even for this trivial case:

==================================================
process download
  packages "wget"
  outputs
    file "data/weekly-incidence.csv"
  # {
      rm {{outputs}}
      wget -O {{outputs}} http://www.sentiweb.fr/datasets/incidence-PAY-3.csv
    }
==================================================

==================================================
process download
  packages "wget"
  outputs
    file "data/weekly-incidence.csv"
  # {
      test -f {{outputs}} || wget -O {{outputs}} 
http://www.sentiweb.fr/datasets/incidence-PAY-3.csv
    }
==================================================

Is there a better solution?

Cheers,
  Konrad.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]