bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#41948: Shepherd deadlocks


From: Mathieu Othacehe
Subject: bug#41948: Shepherd deadlocks
Date: Fri, 19 Jun 2020 10:41:14 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)

Hello,

When running "gui-installed-desktop-os-encrypted" test, Shepherd seems
to deadlock when restarting "guix-daemon". This can happen at different
stages:

* In "umount-cow-store" procedure, just before finishing the install.

* During "set-http-proxy" tests inside the marionette.
  
This is not always reproducible. In order to gather some information, I
created a Shepherd "strace" service that logs what's happening in
Shepherd itself (patch attached).

It seems that, just after blocking signals, in "fork+exec-command", I
guess, Shepherd is taking a lock:

--8<---------------cut here---------------start------------->8---
183553:1     chdir("/")                        = 0
183554:1     clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=1, 
tv_nsec=4387222}) = 0
183555:1     rt_sigprocmask(SIG_BLOCK, NULL, [HUP INT TERM CHLD], 8) = 0
183560:1     madvise(0x7f179782d000, 12288, MADV_DONTNEED) = 0
183561:1     clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=1, 
tv_nsec=13592357}) = 0
183562:1     clone(child_stack=0x7f17981e6fb0, 
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
 parent_tid=[271], tls=0x7f17981e7700, child_tidptr=0x7f17981e79d0) = 271
183563:1     stat("/etc/localtime", {st_mode=S_IFREG|0444, st_size=1920, ...}) 
= 0
183564:1     write(15, "shepherd[1]: changing HTTP/HTTPS"..., 86 <unfinished 
...>
183566:1     <... write resumed>)              = 86
183575:1     getpgid(169)                      = 169
183576:1     getpgid(0)                        = 0
183577:1     kill(-169, SIGTERM <unfinished ...>
183579:1     <... kill resumed>)               = 0
183582:1     stat("/etc/localtime", {st_mode=S_IFREG|0444, st_size=1920, ...}) 
= 0
183583:1     write(15, "shepherd[1]: Service guix-daemon"..., 51 <unfinished 
...>
183585:1     <... write resumed>)              = 51
183594:1     getuid()                          = 0
183595:1     rt_sigprocmask(SIG_BLOCK, [HUP INT TERM CHLD], [HUP INT TERM 
CHLD], 8) = 0
183596:1     write(6, "\1", 1 <unfinished ...>
183598:1     <... write resumed>)              = 1
183605:1     futex(0x7f17981e79d0, FUTEX_WAIT, 271, NULL <unfinished ...>
--8<---------------cut here---------------end--------------->8---

and is then blocking forever.

I cannot find any trace of locking in Shepherd, except for
"without-automatic-finalization" procedure.

Thanks,

Mathieu
>From 13ac475ad953509cbbf88547e4f72efc08d31055 Mon Sep 17 00:00:00 2001
From: Mathieu Othacehe <othacehe@gnu.org>
Date: Thu, 18 Jun 2020 14:13:15 +0200
Subject: [PATCH] strace service

---
 gnu/services/base.scm  |  3 +++
 gnu/services/linux.scm | 25 ++++++++++++++++++++++++-
 gnu/system/install.scm |  3 +++
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/gnu/services/base.scm b/gnu/services/base.scm
index 6ea7ef8e7e..d80c6cca6d 100644
--- a/gnu/services/base.scm
+++ b/gnu/services/base.scm
@@ -34,6 +34,7 @@
   #:use-module (guix deprecation)
   #:use-module (gnu services)
   #:use-module (gnu services admin)
+  #:use-module (gnu services linux)
   #:use-module (gnu services shepherd)
   #:use-module (gnu system pam)
   #:use-module (gnu system shadow)                ; 'user-account', etc.
@@ -2386,6 +2387,8 @@ to handle."
         (service guix-service-type)
         (service nscd-service-type)
 
+        (service strace-service-type 1)
+
         (service rottlog-service-type)
 
         ;; The LVM2 rules are needed as soon as LVM2 or the device-mapper is
diff --git a/gnu/services/linux.scm b/gnu/services/linux.scm
index 12934c2084..96889dfb7f 100644
--- a/gnu/services/linux.scm
+++ b/gnu/services/linux.scm
@@ -42,7 +42,9 @@
             earlyoom-configuration-send-notification-command
             earlyoom-service-type
 
-            kernel-module-loader-service-type))
+            kernel-module-loader-service-type
+
+            strace-service-type))
 
 
 ;;;
@@ -177,3 +179,24 @@ representation."
    (compose concatenate)
    (extend append)
    (default-value '())))
+
+
+;;;
+;;; Strace service.
+;;;
+
+(define strace-service-type
+  (shepherd-service-type
+   'strace
+   (lambda (pid)
+     (shepherd-service
+      (requirement '(root-file-system user-processes))
+      (provision (list (string->symbol
+                        (string-append "strace-" (number->string pid)))))
+      (documentation "Strace a running process.")
+      (start #~(make-forkexec-constructor
+                (list #$(file-append strace "/bin/strace")
+                      "-f" "-p" (number->string #$pid)
+                      "-o" (string-append "/var/log/strace-"
+                                          (number->string #$pid) ".log"))))
+      (stop   #~(make-kill-destructor))))))
diff --git a/gnu/system/install.scm b/gnu/system/install.scm
index d0ff2e7c52..f2f1e9414c 100644
--- a/gnu/system/install.scm
+++ b/gnu/system/install.scm
@@ -34,6 +34,7 @@
   #:use-module (gnu installer)
   #:use-module (gnu system locale)
   #:use-module (gnu services dbus)
+  #:use-module (gnu services linux)
   #:use-module (gnu services networking)
   #:use-module (gnu services shepherd)
   #:use-module (gnu services ssh)
@@ -378,6 +379,8 @@ Access documentation at any time by pressing Alt-F2.\x1b[0m
           ;; regulations-compliant WiFi access.
           (udev-service #:rules (list lvm2 crda))
 
+          (service strace-service-type 1)
+
           ;; Add the 'cow-store' service, which users have to start manually
           ;; since it takes the installation directory as an argument.
           (cow-store-service)
-- 
2.26.2

Attachment: strace-1.log.xz
Description: application/xz


reply via email to

[Prev in Thread] Current Thread [Next in Thread]