[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Managing data files in workflows
From: |
Konrad Hinsen |
Subject: |
Managing data files in workflows |
Date: |
Thu, 25 Mar 2021 10:57:27 +0100 |
Hi everyone,
Coming from make-like workflow systems, I wonder how data files are best
managed in GWL workflow. GWL is clearly less file-centric than make
(which is a Good Thing in my opinion), but at a first reading of the
manual, it doesn't seem to care about files at all, except for
auto-connect.
A simple example:
==================================================
process download
packages "wget"
outputs
file "data/weekly-incidence.csv"
# { wget -O {{outputs}} http://www.sentiweb.fr/datasets/incidence-PAY-3.csv }
workflow influenza-incidence
processes download
==================================================
This works fine the first time, but the second time it fails because the
output file of the process already exists. This doesn't look very
useful. The two behaviors I do see as potentially useful are
1) Always replace the file.
2) Don't run the process if the output file already exists
(as make would do by default)
I can handle this in my bash code of course, but that becomes lengthy
even for this trivial case:
==================================================
process download
packages "wget"
outputs
file "data/weekly-incidence.csv"
# {
rm {{outputs}}
wget -O {{outputs}} http://www.sentiweb.fr/datasets/incidence-PAY-3.csv
}
==================================================
==================================================
process download
packages "wget"
outputs
file "data/weekly-incidence.csv"
# {
test -f {{outputs}} || wget -O {{outputs}}
http://www.sentiweb.fr/datasets/incidence-PAY-3.csv
}
==================================================
Is there a better solution?
Cheers,
Konrad.
- Managing data files in workflows,
Konrad Hinsen <=