emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#56674: closed ([Shepherd] Use of ‘waitpid’, ‘system*’, etc. in serv


From: GNU bug Tracking System
Subject: bug#56674: closed ([Shepherd] Use of ‘waitpid’, ‘system*’, etc. in service code can cause deadlocks)
Date: Thu, 17 Nov 2022 10:24:04 +0000

Your message dated Thu, 17 Nov 2022 11:23:09 +0100
with message-id <87a64pkhgy.fsf@gnu.org>
and subject line Re: bug#58926: Shepherd becomes unresponsive after an interrupt
has caused the debbugs.gnu.org bug report #58926,
regarding [Shepherd] Use of ‘waitpid’, ‘system*’, etc. in service code can 
cause deadlocks
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs@gnu.org.)


-- 
58926: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=58926
GNU Bug Tracking System
Contact help-debbugs@gnu.org with problems
--- Begin Message --- Subject: [Shepherd] Use of ‘waitpid’, ‘system*’, etc. in service code can cause deadlocks Date: Wed, 20 Jul 2022 23:39:08 +0200 User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.1 (gnu/linux)
Hi!

We’ve just had a bad experience with the nginx service on berlin, where
‘herd restart nginx’ would cause shepherd to get stuck forever in
‘waitpid’ on the process that was supposed to start nginx.

The details are unclear, but one thing is clear is that using ‘waitpid’
(either directly or indirectly with ‘system*’, which is what
‘nginx-service-type’ does) is not great:

  1. In the best case, shepherd (as of 0.9.1) is stuck while ‘system*’
     is in ‘waitpid’ waiting for child process completion (“stuck” as
     in: doesn’t do anything, not even answering ‘herd’ requests or
     inetd connections.)

  2. I don’t think that can happen with ‘system*’ (because it’s in C),
     but generally speaking, there’s a possibility that shepherd’s event
     loop will handle child process termination before some other
     user-made ‘waitpid’ call does.

Anyway, that’s a bad situation.

So I can think of several ways to address it:

  1. Change the nginx service ‘stop’ method to just
     (make-kill-destructor), which should work just as well as invoking
     “nginx -s stop”.

  2. Have Shepherd provide a replacement for ‘system*’.

Thoughts?

Ludo’.



--- End Message ---
--- Begin Message --- Subject: Re: bug#58926: Shepherd becomes unresponsive after an interrupt Date: Thu, 17 Nov 2022 11:23:09 +0100 User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Hi,

Ludovic Courtès <ludo@gnu.org> skribis:

> Mathieu Othacehe <othacehe@gnu.org> skribis:
>
>> 1. On my laptop with a Wireguard service trying to reach a non-existing
>> DNS server.
>>
>>             (service wireguard-service-type
>>                      (wireguard-configuration
>>                       (addresses (list "10.0.0.2/24"))
>>                       (dns '("10.0.0.50")) #does not exit
>
> This one is similar to:
>
>   https://issues.guix.gnu.org/53225
>   https://issues.guix.gnu.org/53381
>
> It has to do with the fact that “wg-quick up” blocks until it succeeds
> and that ‘invoke’ gets stuck on ‘waitpid’ until the “wg-quick” process
> terminates.
>
> The solution will be to use something non-blocking instead of ‘invoke’;
> I’m looking into it.

This is fixed in the Shepherd 0.9.3, which landed in Guix commit
283d7318c5b312d7129adb6dbeea6ad205ce89d1.

As I wrote, I’m not sure whether it fixes the nginx situation since I
could not reproduce it.  I’m closing and let’s open a new issue
specifically for nginx if it comes up again with 0.9.3.

Thanks,
Ludo’.


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]