[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#53463: ci.guix.gnu.org not building the 'guix' job
From: |
Ludovic Courtès |
Subject: |
bug#53463: ci.guix.gnu.org not building the 'guix' job |
Date: |
Tue, 08 Feb 2022 11:22:11 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) |
Hi,
Mathieu Othacehe <othacehe@gnu.org> skribis:
>> Oh! That indicates that it’s failing to offload to one of the
>> ‘localhost’ build machines specified in /etc/guix/machines.scm.
>> Normally there’s an SSH tunnel set up for those, but I guess it broke.
>>
>> Perhaps we can update /etc/guix/machines.scm to refer to armhf-linux
>> machines by their WireGuard IP?
>
> Seems like the right thing to do. This bit is also an unstaged change in
> the berlin maintenance repository, we should commit it. Tobias, could
> you have a look :) ?
>
> +(define powerpc64le
> + (list
> + ;; A VM donated/hosted by OSUOSL & administered by nckx.
> + ;; XXX: SSH tunnel via overdrive1:
> + ;; ssh -L 2224:p9.tobias.gr:22 hydra@10.0.0.3
> + #;(build-machine
> + ;;(name "p9.tobias.gr")
> + (name "localhost")
> + (port 2224)
> + (user "hydra")
> + (systems '("powerpc64le-linux"))
> + (host-key "ssh-ed25519
> AAAAC3NzaC1lZDI1NTE5AAAAIJEbRxJ6WqnNLYEMNDUKFcdMtyZ9V/6oEfBFSHY8xE6A nckx"))))
IIRC this machine is now running WireGuard, Tobias? If so, could you
change this to refer to its WireGuard IP and commit it?
> I also found that other machines were unreachable and commented them:
>
> ;; CPU: 16 ARM Cortex-A72 cores
> ;; RAM: 32 GB
> - (list (build-machine
> + (list #;(build-machine
> ;;kreuzberg
> (name "10.0.0.9")
> (user "hydra")
Ricardo, could you check what’s wrong with kreuzberg?
> @@ -243,13 +256,13 @@
> ;; BeagleBoard X15 kindly hosted by Simon Josefsson.
> ;; CPU: Cortex A15 (2 cores)
> ;; RAM: 2 GB
> - (build-machine
> + #;(build-machine
> (name "10.0.0.5") ;guix-x15
> (user "hydra")
> (systems '("armhf-linux"))
> (host-key "ssh-ed25519
> AAAAC3NzaC1lZDI1NTE5AAAAIOfXjwCAFWeGiUoOVXEgtIeXxbtymjOTg7ph1ObMAcJ0
> root@beaglebone"))
>
> - (build-machine
> + #;(build-machine
> (name "10.0.0.6") ;guix-x15b
> (user "hydra")
> (systems '("armhf-linux"))
Oops.
Note that it’s not necessary to comment them all out. As long as at
least one machine is available for a given system type, we’re fine:
‘guix offload’ will pick it up.
> Nevertheless we are hitting an offload issue here, maybe an occurrence
> of #24496. The offload mechanism should timeout when a machine is
> unreachable instead of retrying over and over, causing all evaluation
> processes to hang.
Yes, though the problem here is that some architectures were left with
zero machines IIRC, so it would have failed one way or another.
Thanks!
Ludo’.