bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#24496: offloading should fall back to local build after n tries


From: Maxim Cournoyer
Subject: bug#24496: offloading should fall back to local build after n tries
Date: Fri, 17 Dec 2021 16:57:33 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Hello Ludovic,

Ludovic Courtès <ludo@gnu.org> writes:

> Hi!
>
> zimoun <zimon.toutoune@gmail.com> skribis:
>
>> I am just hitting this old bug#24496 [1].
>>
>> On Mon, 26 Sep 2016 at 18:20, ludo@gnu.org (Ludovic Courtès) wrote:
>>> ng0 <ngillmann@runbox.com> skribis:
>>>
>>>> When I forgot that my build machine is offline and I did not pass
>>>> --no-build-hook, the offloading keeps trying forever until I had to
>>>> cancel the build, boot the build-machine and started the build again.
>>
>> [...]
>>
>>> Like you say, on Hydra-style setup this could be a problem: the
>>> front-end machine may have --max-jobs=0, meaning that it cannot perform
>>> builds on its own.
>>>
>>> So I guess we would need a command-line option to select a different
>>> behavior.  I’m not sure how to do that because ‘guix offload’ is
>>> “hidden” behind ‘guix-daemon’, so there’s no obvious place for such an
>>> option.
>>
>> When the build machine used to offload is offline and the master daemon
>> is --max-jobs=0, I expect X tries (leading to timeout) and then just
>> fails with a hint, where X is defined by user.  WDYT?
>>
>>
>>> In the meantime, you could also hack up your machines.scm: it would
>>> return a list where unreachable machines have been filtered out.
>>
>> Maybe, this could be done by “guix offload”.
>
> Prior to commit efbf5fdd01817ea75de369e3dd2761a85f8f7dd5, this was the
> case: an unreachable machine would have ‘machine-load’ return +inf.0,
> and so it would be discarded from the list of candidates.
>
> However, I think this behavior was unintentionally lost in
> efbf5fdd01817ea75de369e3dd2761a85f8f7dd5.  Maxim, WDYT?

I just reviewed this commit, and don't see anywhere where the behavior
would have changed.  The discarding happens here:

--8<---------------cut here---------------start------------->8---
-         (if (and node (< load 2.) (>= space %minimum-disk-space))
+         (if (and node
+                  (or (not threshold) (< load threshold))
+                  (>= space %minimum-disk-space))
--8<---------------cut here---------------end--------------->8---

previously load could be set to +inf.0.  Now it is a float between 0.0
and 1.0, with threshold defaulting to 0.6.

As far as I remember, this has always been a problem for me (busy
offload machines being forever retried with no fallback to the local
machine).

Thanks,

Maxim





reply via email to

[Prev in Thread] Current Thread [Next in Thread]