bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#41948: Shepherd deadlocks


From: Mathieu Othacehe
Subject: bug#41948: Shepherd deadlocks
Date: Sun, 16 Aug 2020 11:56:37 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)

Hey Ludo,

> We should be able to reproduce it with much simpler tests then, right?
> Like maybe “while : ; do herd restart guix-daemon ; done” or similar?

Well I tried that without success. Then I had a closer look to the
strace log.

Turns out there are two concurrent "finalizer" threads:

--8<---------------cut here---------------start------------->8---
1     clone(child_stack=0x7f17981e6fb0, 
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
 parent_tid=[271], tls=0x7f17981e7700, child_tidptr=0x7f17981e79d0) = 271
--8<---------------cut here---------------end--------------->8---

and this one,

--8<---------------cut here---------------start------------->8---
217   <... clone resumed>, parent_tid=[253], tls=0x7f1799309700, 
child_tidptr=0x7f17993099d0) = 253
--8<---------------cut here---------------end--------------->8---

The first one is spawned from Shepherd directly. The other one is
spawned from the forked process in "marionette-shepherd-service".

Those two finalizer threads share the same pipe. When we try to
stop the finalizer thread in Shepherd, right before forking a new
process, we send a '\1' byte to the finalizer pipe.

--8<---------------cut here---------------start------------->8---
1     write(6, "\1", 1 <unfinished ...>
--8<---------------cut here---------------end--------------->8---

which is received by (line 183597): 

--8<---------------cut here---------------start------------->8---
253   <... read resumed>"\1", 1)        = 1
--8<---------------cut here---------------end--------------->8---

the marionette finalizer thread. Then, we pthread_join the Shepherd
finalizer thread, which never stops! Quite unfortunate.

Here's a small reproducer attached. So unless I'm wrong this is a Guile
issue, that will cause any program that uses at least two primitive-fork
calls to possibly hang.

I'm quite convinced that those two bugs are directly related:

* https://issues.guix.info/31925
* https://issues.guix.gnu.org/42353

Now regarding the fix of this issue, I guess that a process forked with
"primitive-fork" in Guile should close it's parent finalizer pipe and
open a new one.

WDYT?

Thanks,

Mathieu

Attachment: t.scm
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]