[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug#32408] [PATCH shepherd] Allow replacement of services

From: Carlo Zancanaro
Subject: [bug#32408] [PATCH shepherd] Allow replacement of services
Date: Tue, 21 Aug 2018 07:16:48 +1000
User-agent: mu4e 1.0; emacs 26.1

Hey Ludo,

On Tue, Aug 21 2018, Ludovic Courtès wrote:
We’ll still want to be able to special-case things like nginx that can be “hot-replaced”, though. So perhaps, in addition to this patch on the Shepherd side, we’ll need extra stuff in (gnu services shepherd).

Yeah, if we expose the replacement field directly, then we'll need some supporting code in (gnu services shepherd), even if it's just to detect whether the service is stopped or not before doing a replacement. Although ideally our interface wouldn't introduce race conditions like that. (See below for more thoughts on this.)

For instance, the ‘actions’ field of <shepherd-service> could, by default, include an “upgrade” action that simply sets the ‘replacement’ slot. For nginx, we’d provide a custom “upgrade” action that does “nginx -s restart” or whatever it is that needs to be done.

‘guix system reconfigure’ would automatically invoke the ‘upgrade’ action for each new service.


How many services can we meaningfully upgrade like this? My understanding is that most of our system services a fed an immutable configuration file, and thus restarting/reloading won't actually upgrade them. In order to make an upgrade action work the service would have to mutate itself into a new correct configuration, as well as restarting/reloading the underlying daemon. It's even trickier if the daemon itself has been upgraded, because then the process will have to be restarted anyway.

At any rate, I think the replacement mechanism (this patch) is just one way that a service can be reloaded. It would probably be a good idea to create a higher-level abstraction over it. I think other mechanisms (like a upgrade/reload action) should be handled on the Guix side of things.

+  (let ((replacement (slot-ref service 'replacement)))
+    (define (copy-slot! slot)
+      (slot-set! service slot (slot-ref replacement slot)))
+    (when replacement
+      (copy-slot! 'provides)
+      (copy-slot! 'requires)
+      (copy-slot! 'respawn?)
+      (copy-slot! 'start)
+      (copy-slot! 'stop)
+      (copy-slot! 'actions)
+      (copy-slot! 'running)
+      (copy-slot! 'docstring))
+    service))

Having a hardcoded list of slots sounds error-prone—surely we’ll forget to update it down the road. I wonder what else could be done.

One option would be to grab the block asyncs and atomically replace the service in the ‘%services’ hash table. Then we only need to copy the ‘last-respawns’ slot to the new service, I believe. (This changes the object identity of the service but I think its OK.)

Another option would be to use GOOPS tricks to iterate over the list of slots and have a list of slots *not* to copy. I’m not a big fan of this option, though.

My favourite option for this would be to separate the <service> object into an immutable <service> and a mutable <service-state>. The <service-state> object would have a reference to a <service> object in order to invoke actions on it, and it could also hold a second <service> object as a replacement. Then the swap would be much more straightforward. I haven't done any real work towards this, though.

In the short term, I'd rather replace it in the %services hash table. I did it by copying slots because I wasn't sure I would get the details of the swap right and didn't have time to properly work out how to do it. I'll give it a go!

+(let ((service (lookup-running 'test)))
+  (slot-set! service 'replacement
+             (make <service>

I wonder if we should let users fiddle with ‘replacement’ directly, or if we should provide a higher-level construct.

For instance, ‘register-services’ could transparently set the ‘replacement’ field for services already registered instead of doing:

    (assert (null? (lookup-services (canonical-name new))))

Not sure if there are cases where this behavior would be undesirable, though.


With this current patch the replacement field is only checked at the point when the service is stopped, so the field could only be set when the service is actually running. I think it makes the most sense to just replace the service directly if it's not stopped.

I can't think of any undesirable cases, but having a higher-level interface is a good idea. At the very least we need to control the inherent race condition involved in (if running? do-x do-y) for if the service is stopped after the running? check. At the moment I think the only thing we have to worry about there is signals, but if we're going to move to have more parallelism through fibers then we might need to be even more careful.

I'll try to send through an updated patch later this week.


Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]