guix-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug#33899] [PATCH 0/5] Distributing substitutes over IPFS


From: Hector Sanjuan
Subject: [bug#33899] [PATCH 0/5] Distributing substitutes over IPFS
Date: Fri, 18 Jan 2019 11:26:18 +0000

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, January 18, 2019 10:52 AM, Ludovic Courtès <address@hidden> wrote:

> Hello,
>
> Hector Sanjuan address@hidden skribis:
>
> > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > On Monday, January 14, 2019 2:17 PM, Ludovic Courtès address@hidden wrote:
>
> [...]
>

>
> Isn’t there a way, then, to achieve the same behavior with the custom
> format? The /api/v0/add entry point has a ‘pin’ argument; I suppose we
> could leave it to false except when we add the top-level “directory”
> node? Wouldn’t that give us behavior similar to that of Unixfs?
>

Yes. What you could do is to add every file flatly/separately (with pin=false)
and at the end add an IPLD object with references to all the
files that you added and including the exec bit information (and size?).
This is just a JSON file:

{
   "name": "package name",
   "contents": [
       {
           "path": "/file/path", # so you know where to extract it later
           "exec": true,
           "ipfs": { "/": "Qmhash..." }
       },
       ...
}

This needs to be added to IPFS with the /api/v0/dag/put endpoint (this
converts it to CBOR - IPLD-Cbor is the actual block format used here).
When this is pinned (?pin=true), this will pin all the things referenced
from it recursively in the way we want.

So this will be quite similar to unixfs. But note that if this blob
ever grows over the 2M block-size limit because you have a package with
many files, you will need to start solving problems that unixfs solves
automatically now (directory sharding).

Because IPLD-cbor is supported, ipfs, the gateway etc will know how to
display these manifests, the info in it and their links.


> > When the user puts the single root hash in ipfs.io/ipfs/<hash>, it
> > will display correctly the underlying files and the people will be
> > able to navigate the actual tree with both web and cli.
>
> Right, though that’s less important in my view.
>
> > Note that every file added to IPFS is getting wrapped as a Unixfs
> > block anyways. You are just saving some "directory" nodes by adding
> > them separately.
>
> Hmm weird. When I do /api/v0/add, I’m really just passing a byte
> vector; there’s no notion of a “file” here, AFAICS. Or am I missing
> something?

They are wrapped in Unixfs blocks anyway by default. From the moment
the file is >256K it will get chunked into several  pieces and
a Unixfs block (or multiple, if a really big file) is necessary to
reference them. In this case the root hash will be a Unixfs node
with links to the parts.

There is a "raw-leaves" option which does not wrap the individual
blocks with unixfs, so if the file is small to not be chunked,
you can avoid the default unixfs-wrapping this way.


>
> > > > It will probably need some trial an error to get the multi-part right
> > > > to upload all in a single request. The Go code HTTP Clients doing
> > > > this can be found at:
> > > > https://github.com/ipfs/go-ipfs-files/blob/master/multifilereader.go#L96
> > > > As you see, a directory part in the multipart will have the 
> > > > content-type Header
> > > > set to "application/x-directory". The best way to see how "abspath" etc 
> > > > is set
> > > > is probably to sniff an `ipfs add -r <testfolder>` operation 
> > > > (localhost:5001).
> > > > Once UnixFSv2 lands, you will be in a position to just drop the sexp 
> > > > file
> > > > altogether.
> > >
> > > Yes, that makes sense. In the meantime, I guess we have to keep using
> > > our own format.
> > > What are the performance implications of adding and retrieving files one
> > > by one like I did? I understand we’re doing N HTTP requests to the
> > > local IPFS daemon where “ipfs add -r” makes a single request, but this
> > > alone can’t be much of a problem since communication is happening
> > > locally. Does pinning each file separately somehow incur additional
> > > overhead?
> >
> > Yes, pinning separately is slow and incurs in overhead. Pins are stored
> > in a merkle tree themselves so it involves reading, patching and saving. 
> > This
> > gets quite slow when you have very large pinsets because your pins block 
> > size
> > grow. Your pinset will grow very large if you do this. Additionally the
> > pinning operation itself requires global lock making it more slow.
>
> OK, I see.

I should add that even if you want to /add all files separately (and then
put the IPLD manifest I described above), you can still add them all in the same
request (it becomes easier as you just need to put more parts in the multipart
and don't have to worry about names/folders/paths).

The /add endpoint will forcefully close the HTTP connection for every
/add (long story) and small delays might add up to a big one. Specially relevant
if using IPFS Cluster, where /add might send the blocks somewhere else and does
needs to do some other things.


>
> > But, even if it was fast, you will not have a way to easily unpin
> > anything that becomes obsolete or have an overview of to where things
> > belong. It is also unlikely that a single IPFS daemon will be able to
> > store everything you build, so you might find yourself using IPFS Cluster
> > soon to distribute the storage across multiple nodes and then you will
> > be effectively adding remotely.
>
> Currently, ‘guix publish’ stores things as long as they are requested,
> and then for the duration specified with ‘--ttl’. I suppose we could
> have similar behavior with IPFS: if an item hasn’t been requested for
> the specified duration, then we unpin it.
>
> Does that make sense?

Yes, in fact I wanted IPFS Cluster to support a TTL so that things are
automatically unpinned when it expires too.

>
> Thanks for your help!
>
> Ludo’.

Thanks!

Hector





reply via email to

[Prev in Thread] Current Thread [Next in Thread]