[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#62619: Shepherd desertion upon service canonical name change?
From: |
Ludovic Courtès |
Subject: |
bug#62619: Shepherd desertion upon service canonical name change? |
Date: |
Thu, 27 Apr 2023 14:55:38 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) |
Hi Bruno,
Bruno Victal <mirai@makinata.eu> skribis:
> Upon a guix system reconfigure, if a running service has undergone a
> canonical name change since the previous generation
> then shutdown or reboot commands fail, with shepherd indicating itself (root
> service) as stopped.
Oh, fun.
> mirai@guix-nuc ~$ sudo reboot
[…]
> Service xvnc has been stopped.
> Service git-daemon has been stopped.
> Assertion (eq? (canonical-name new) (canonical-name old)) failed.
> assertion-failed()
> Service root has been stopped.
I think this assertion failure is the root cause. It comes from
‘register-services’ in 0.9.3, which reads:
--8<---------------cut here---------------start------------->8---
(define (register-services . new-services)
"Add NEW-SERVICES to the list of known services. If a service has already
been registered, arrange to have it replaced when it is next stopped. If it
is currently stopped, replace it immediately."
(define (register-single-service new)
;; Sanity-checks first.
(assert (list-of-symbols? (provided-by new)))
(assert (list-of-symbols? (required-by new)))
(assert (boolean? (respawn? new)))
;; FIXME: Just because we have a unique canonical name now doesn't mean it
;; will remain unique as other services are added. Whenever a service is
;; added it should check that it's not conflicting with any already
;; registered canonical names.
(match (lookup-services (canonical-name new))
(() ;; empty, so we can safely add ourselves
(for-each (lambda (name)
(let ((old (lookup-services name)))
(hashq-set! %services name (cons new old))))
(provided-by new)))
((old . rest) ;; one service registered, it may be an old version of us
(assert (null? rest))
(assert (eq? (canonical-name new) (canonical-name old)))
(if (running? old)
(slot-set! old 'replacement new)
(replace-service old new)))))
(for-each register-single-service new-services))
--8<---------------cut here---------------end--------------->8---
‘register-services’ was called from ‘replace-service’, itself called
from ‘stop’ (right after ‘networking’ had been actually stopped):
--8<---------------cut here---------------start------------->8---
;; SERVICE is no longer running.
(slot-set! service 'running #f)
;; Reset the list of respawns.
(slot-set! service 'last-respawns '())
;; Replace the service with its replacement, if it has one
(let ((replacement (slot-ref service 'replacement)))
(when replacement
(replace-service service replacement))) ;<- here
--8<---------------cut here---------------end--------------->8---
The assertion failure was thrown here. ‘stop’ calls itself
recursively but it doesn’t catch exceptions from recursive calls:
--8<---------------cut here---------------start------------->8---
(fold-services (lambda (other acc)
(if (and (running? other)
(required-by? service other))
(append (stop other) acc) ;<- here
acc))
'())
--8<---------------cut here---------------end--------------->8---
The problem is that ‘root-service’ marks itself as stopped before it has
effectively shut down the services:
--8<---------------cut here---------------start------------->8---
#:stop (lambda (unused . args)
(local-output (l10n "Exiting shepherd..."))
;; Prevent that we try to stop ourself again.
(slot-set! root-service 'running #f)
(shutdown-services)
(quit))
--8<---------------cut here---------------end--------------->8---
So what happened is that ‘shutdown-services’ threw, the exception wasn’t
caught, and thus it never called ‘quit’. QED.
The service registry in the soon-to-be-released 0.10.0 no longer has
that assertion failure (it vanished in
08510a2a2aaab388c90dd402bd7506d33014454f). Instead, it registers a
replacement for the first service found to have one of the names of the
new service.
The problem of the #:stop method of ‘root-service’ still exists: if
‘shutdown-services’ throws, the ‘root’ service won’t terminate and it’ll
remain in ‘stopping’ status. We could wrap the ‘shutdown-services’ call
in ‘false-if-exception’, but I’d lean towards not doing it: it’s not
supposed to throw, so maybe it’s best not to swallow the exception.
To summarize, I believe the problem is solved in 0.10.
Thanks,
Ludo’.