[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [gwl-devel] Next steps for the GWL
From: |
Pjotr Prins |
Subject: |
Re: [gwl-devel] Next steps for the GWL |
Date: |
Thu, 6 Jun 2019 09:06:59 -0500 |
User-agent: |
NeoMutt/20170113 (1.7.2) |
We should also assess this
https://labs.eleks.com/2019/03/ipfs-network-data-replication.html
On Thu, Jun 06, 2019 at 08:44:04AM -0500, Pjotr Prins wrote:
> IPFS is meant for data sharing and reproducibility. It also allows for
> private networks which is rather important.
>
> Scalability of IPFS is a concern, so either we cache using IPFS or we
> have some other caching mechanism.
>
> git-annex is too much of a hack in my book. It also does not scale
> that well.
>
> Pj.
>
> On Thu, Jun 06, 2019 at 12:55:52PM +0200, zimoun wrote:
> > Hi,
> >
> > On Thu, 6 Jun 2019 at 12:11, Ricardo Wurmus
> > <address@hidden> wrote:
> >
> > > > One of the things I'd love to do
> > > > with GWL is to make it play well with git-annex, something that would
> > > > almost certainly be too specific for GWL itself. For example
> > > >
> > > > * Make data caching git-annex aware. When deciding to recompute data
> > > > files, GWL avoids computing the hash of data files, using scripts as
> > > > the cheaper proxy, as you described in address@hidden
> > > > But if the user is tracking data files with git-annex, getting the
> > > > hash of data files becomes less expensive because we can ask
> > > > git-annex for the hash it has already computed.
> > > >
> > > > * Support getting annex data files on demand (i.e. 'git annex get') if
> > > > they are needed as inputs.
> > >
> > > I wonder what the protocol should look like. Should a workflow
> > > explicitly request a “git annex” file or should it be up to the person
> > > running the workflow, i.e. when “git annex” has been configured to be
> > > the cache backend it would simply look up the declared input/output
> > > files there.
> > >
> > > I suppose the answers would equally apply to using IPFS as a cache.
> >
> > I agree that the mechanism such as `git-annex` should be nice.
> > But is it not a mean for the CAS that we previously discussed?
> >
> > I fully agree with the features and their description. Totally cool!
> > However, I am a bit reluctant with `git-annex` because it requires a
> > Haskell compiler and it is far far from "bootstrapability". I am aware
> > of the Ricardo's try---and AFIAK the only one. And here [1]
> > explanations by one Haskeller.
> >
> > My opinion: GWL should stay on the path of Reproducibility,
> > end-to-end. So `git-annex` should be a transitional step---while the
> > Haskell bootstrap is not solved---as a mean for the CAS (cache) and I
> > would find more elegant to use the "data-oriented IPFS": IPLD [2].
> >
> >
> > [1] https://www.joachim-breitner.de/blog/748-Thoughts_on_bootstrapping_GHC
> > [2] https://ipld.io/
> >
> >
> > All the best,
> > simon
> >
Re: [gwl-devel] Next steps for the GWL, Ricardo Wurmus, 2019/06/12