guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Update on bordeaux.guix.gnu.org


From: Christopher Baines
Subject: Re: Update on bordeaux.guix.gnu.org
Date: Fri, 03 Dec 2021 09:39:17 +0000
User-agent: mu4e 1.6.6; emacs 27.2

Ludovic Courtès <ludo@gnu.org> writes:

> Christopher Baines <mail@cbaines.net> skribis:
>
>> I've been doing some performance tuning, submitting builds is now more
>> parallelised, a source of slowness when fetching builds has been
>> addressed, and one of the long queries involved in allocating builds has
>> been removed, which also improved handling of the WAL (Sqlite write
>> ahead log).
>>
>> There's also a few new features. Agents can be deactivated which means
>> they won't get any builds allocated. The coordinator now checks the
>> hashes of outputs which are submitted, a safeguard which I added because
>> the coordinator now also supports resuming the uploads of outputs. This
>> is particularly important when trying to upload large (> 1GiB) outputs
>> over slow connections.
>>
>> I also added a new x86_64 build machine. It's a 4 core Intel NUC that I
>> had sitting around, but I cleaned it up and got it building things. This
>> was particularly useful as I was able to use it to retry building
>> guile@3.0.7, which is extremely hard to build [2]. This was blocking
>> building the channel instance derivations for x86_64-linux.
>>
>> 2: 
>> https://data.guix.gnu.org/gnu/store/7k6s13bzbz5fd72ha1gx9rf6rrywhxzz-guile-3.0.7.drv
>
> Neat!  (Though I wouldn’t say building Guile is “extremely hard”,
> especially on x86_64.  :-))  The ability to keep retrying is much
> welcome.

To rephrase, I found it extremely hard to get that particular Guile
derivation to build successfully, it failed to build 12 times, and only
succeeded when I added new hardware to attempt on (I'm guessing the
particular issue I was encountering was exacerbated by more cores).

Unfortunately, I also think that you finding it easy to build actually
contributes to the problem here, since it makes finding and addressing
issues like this harder.

>> Space is running out on bayfront, the machine that runs the coordinator,
>> stores all the nars and build logs, and serves the substitutes. I knew
>> this was probably going to be an issue, bayfront didn't have much space
>> to begin with, but I had hoped I'd be further forward in developing some
>> way to allow moving the nars around between multiple machines, to remove
>> the need to store all of them on bayfront. I have got a plan, there's
>> some ideas I mentioned back in February [4], but I haven't got around to
>> implementing anything yet. The disk space usage trend is pretty much
>> linear, so if things continue without any change, I think it will be
>> necessary to pause the agents within a month, to avoid filling up
>> bayfront entirely.
>
> Ah, bummer.  I hope we can find a solution one way or another.
> Certainly we could replicate nars on another machine with more disk,
> possibly buying the necessary hardware with the project funds.

Since this email got a bit delayed when I sent it, things have moved on
a bit now.

90% disk usage was the threshold I had in mind for bayfront, and that's
now pretty much been reached so I've paused all the agents. My plans for
how to address this have also developed a bit as well, but it's still
going to take a month at least to get things going again.

Chris

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]