Managing data files in workflows

gwl-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Managing data files in workflows

From:	Konrad Hinsen
Subject:	Managing data files in workflows
Date:	Thu, 25 Mar 2021 10:57:27 +0100

Hi everyone,

Coming from make-like workflow systems, I wonder how data files are best
managed in GWL workflow. GWL is clearly less file-centric than make
(which is a Good Thing in my opinion), but at a first reading of the
manual, it doesn't seem to care about files at all, except for
auto-connect.

A simple example:

==================================================
process download
  packages "wget"
  outputs
    file "data/weekly-incidence.csv"
  # { wget -O {{outputs}} http://www.sentiweb.fr/datasets/incidence-PAY-3.csv }

workflow influenza-incidence
  processes download
==================================================

This works fine the first time, but the second time it fails because the
output file of the process already exists. This doesn't look very
useful. The two behaviors I do see as potentially useful are

 1) Always replace the file.
 2) Don't run the process if the output file already exists
    (as make would do by default)

I can handle this in my bash code of course, but that becomes lengthy
even for this trivial case:

==================================================
process download
  packages "wget"
  outputs
    file "data/weekly-incidence.csv"
  # {
      rm {{outputs}}
      wget -O {{outputs}} http://www.sentiweb.fr/datasets/incidence-PAY-3.csv
    }
==================================================

==================================================
process download
  packages "wget"
  outputs
    file "data/weekly-incidence.csv"
  # {
      test -f {{outputs}} || wget -O {{outputs}} 
http://www.sentiweb.fr/datasets/incidence-PAY-3.csv
    }
==================================================

Is there a better solution?

Cheers,
  Konrad.

[Prev in Thread]

Current Thread

[Next in Thread]

Managing data files in workflows, Konrad Hinsen <=
- Re: Managing data files in workflows, zimoun, 2021/03/26
  - Re: Managing data files in workflows, Konrad Hinsen, 2021/03/26
- Re: Managing data files in workflows, Ricardo Wurmus, 2021/03/26
  - Re: Managing data files in workflows, Konrad Hinsen, 2021/03/26
    - Re: Managing data files in workflows, Konrad Hinsen, 2021/03/26
    - Re: Managing data files in workflows, Ricardo Wurmus, 2021/03/26
    - Re: Managing data files in workflows, Konrad Hinsen, 2021/03/26

Prev by Date: Re: Getting started with GWL 0.3.0
Next by Date: Re: Managing data files in workflows
Previous by thread: Getting started with GWL 0.3.0
Next by thread: Re: Managing data files in workflows
Index(es):
- Date
- Thread