guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Experimental nar-herder support for serving fixed output files by hash


From: Christopher Baines
Subject: Experimental nar-herder support for serving fixed output files by hash
Date: Fri, 24 Jun 2022 09:10:00 +0100
User-agent: mu4e 1.6.10; emacs 28.1

Hey!

The nar-herder helps with managing a collection of nars. There's some
overlap with the functionality of guix publish in that both tools can
serve narinfo files which is a key part of providing substitutes.

One thing that guix publish does aside from serving narinfo files is
providing access to files in the store produced by fixed output
derivations. Package sources are fixed output derivations, so this
basically means single file package sources, like tar files.

Using ci.guix.gnu.org as an example, this looks like:

  
https://ci.guix.gnu.org/file/0ad-0.0.25b-alpha.tar.xz/sha256/1p9fa8f7sjb9c5wl3mawzyfqvgr614kdkhrj2k4db9vkyisws3fp

You can request a file from the store, if you know it's name and
hash. In guix publish, this works by computing the
/gnu/store/... filename for a file with this name and hash, and then
serving it if it exists. Additionally, on ci.guix.gnu.org, there's some
NGinx caching in front so some files may be still available, even if
they've been removed from the store.

With the nar-herder, the implementation is a little trickier. Since the
nar-herder manages a collection of nars, rather than serving things from
the store, it might have the file being requested but it's inside a
probably compressed nar file. So, to respond to these requests, the
nar-herder has to take the relevant nar file and then read the file out
of it. I've now got an initial implementation of this:

  
https://git.cbaines.net/guix/nar-herder/commit/?id=042f49e5fb52ea844ed5d29c17b26fbc8ad49f0e

The code isn't great, there's some difficulty in extracting the single
file from the nar, but the biggest problem is a limitation in the guile
fibers web server. Currently, responses have to be read in to memory,
which is fine for we pages, but not great if you're trying to serve
files which can be multiple gigabytes in size. This also means that the
first byte of the response is available when all the bytes are
available, so the download is slow to start.

With all of that said though, it does seem to work. For testing, I've
enabled it on bishan, which serves the bordeaux.guix.gnu.org collection
of nars. It only has IPv6 connectivity, so you'll only be able to try
this out if you've got an IPv6 support locally:

  
https://bishan.guix.gnu.org/file/0ad-0.0.25b-alpha.tar.xz/sha256/1p9fa8f7sjb9c5wl3mawzyfqvgr614kdkhrj2k4db9vkyisws3fp

In terms of next steps, there's some things to do with improving the
implementation, but it would be good to hear if this is actually
worthwile?

ci.guix.gnu.org is already used as a content addressed mirror, although
given that there's a push to keep the store on berlin small, I'm not
sure how many files are actually available, or will be available in the
future. There's a 50G NGinx cache, of which I think 7G is used, so this
feature is probably being used a bit at least.

In terms of what enabling this for the bordeaux.guix.gnu.org collection
of nars would look like, I think there's roughly 50,000 tarballs taking
up at least a tebibyte of space which would be downloadable. These are
available as substitutes, but maybe there's value in making them
available this way as well?

Let me know what you think?

Thanks,

Chris


1:
sqlite> SELECT SUM(size) FROM narinfo_files WHERE url LIKE '%.tar.%';
1102376493623
sqlite> SELECT COUNT(*) FROM narinfo_files WHERE url LIKE '%.tar.%';
48326

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]