bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#53463: ci.guix.gnu.org not building the 'guix' job


From: Ludovic Courtès
Subject: bug#53463: ci.guix.gnu.org not building the 'guix' job
Date: Tue, 08 Feb 2022 11:22:11 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Hi,

Mathieu Othacehe <othacehe@gnu.org> skribis:

>> Oh!  That indicates that it’s failing to offload to one of the
>> ‘localhost’ build machines specified in /etc/guix/machines.scm.
>> Normally there’s an SSH tunnel set up for those, but I guess it broke.
>>
>> Perhaps we can update /etc/guix/machines.scm to refer to armhf-linux
>> machines by their WireGuard IP?
>
> Seems like the right thing to do. This bit is also an unstaged change in
> the berlin maintenance repository, we should commit it. Tobias, could
> you have a look :) ?
>
> +(define powerpc64le
> +  (list
> +   ;; A VM donated/hosted by OSUOSL & administered by nckx.
> +   ;; XXX: SSH tunnel via overdrive1:
> +   ;; ssh -L 2224:p9.tobias.gr:22 hydra@10.0.0.3
> +   #;(build-machine
> +    ;;(name "p9.tobias.gr")
> +    (name "localhost")
> +    (port 2224)
> +    (user "hydra")
> +    (systems '("powerpc64le-linux"))
> +    (host-key "ssh-ed25519 
> AAAAC3NzaC1lZDI1NTE5AAAAIJEbRxJ6WqnNLYEMNDUKFcdMtyZ9V/6oEfBFSHY8xE6A nckx"))))

IIRC this machine is now running WireGuard, Tobias?  If so, could you
change this to refer to its WireGuard IP and commit it?

> I also found that other machines were unreachable and commented them:
>
>    ;; CPU: 16 ARM Cortex-A72 cores
>    ;; RAM: 32 GB
> -  (list (build-machine
> +  (list #;(build-machine
>           ;;kreuzberg
>           (name "10.0.0.9")
>           (user "hydra")

Ricardo, could you check what’s wrong with kreuzberg?

> @@ -243,13 +256,13 @@
>     ;; BeagleBoard X15 kindly hosted by Simon Josefsson.
>     ;; CPU: Cortex A15 (2 cores)
>     ;; RAM: 2 GB
> -   (build-machine
> +   #;(build-machine
>      (name "10.0.0.5")                   ;guix-x15
>      (user "hydra")
>      (systems '("armhf-linux"))
>      (host-key "ssh-ed25519 
> AAAAC3NzaC1lZDI1NTE5AAAAIOfXjwCAFWeGiUoOVXEgtIeXxbtymjOTg7ph1ObMAcJ0 
> root@beaglebone"))
>  
> -   (build-machine
> +   #;(build-machine
>      (name "10.0.0.6")                   ;guix-x15b
>      (user "hydra")
>      (systems '("armhf-linux"))

Oops.

Note that it’s not necessary to comment them all out.  As long as at
least one machine is available for a given system type, we’re fine:
‘guix offload’ will pick it up.

> Nevertheless we are hitting an offload issue here, maybe an occurrence
> of #24496. The offload mechanism should timeout when a machine is
> unreachable instead of retrying over and over, causing all evaluation
> processes to hang.

Yes, though the problem here is that some architectures were left with
zero machines IIRC, so it would have failed one way or another.

Thanks!

Ludo’.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]