qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] "make check" random hangs


From: Peter Maydell
Subject: [Qemu-devel] "make check" random hangs
Date: Thu, 18 Sep 2014 11:22:48 -0700

I've noticed recently that "make check" is prone to random hangs
(you can see this in Travis results but also I've caught it occurring
on my local machine). Looking at one recent example:

petmay01 30042  0.0  0.0  24892 14208 pts/13   S+   17:09   0:00
       \_ make -C build/clang check
petmay01 31106  0.0  0.0   5336   644 pts/13   S+   17:10   0:00
           \_ /bin/sh -c echo "GTESTER check-qtest-i386" &&
QTEST_QEMU_BINARY=i386-softmmu/qemu-system-i386
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$((RANDOM % 255 + 1))} gtester -k
-q -m=quick tests/endianness-test tests/fdc-test tests/ide-test
tests/hd-geo-test tests/boot-order-test tests/bios-tables-test
tests/rtc-test tests/i440fx-test tests/fw_cfg-test tests/blockdev-test
tests/qdev-monitor-test tests/wdt_ib700-test tests/e1000-test
tests/rtl8139-test tests/pcnet-test tests/eepro100-test
tests/ne2000-test tests/nvme-test tests/ac97-test tests/es1370-test
tests/virtio-net-test tests/virtio-balloon-test tests/virtio-blk-test
tests/virtio-rng-test tests/virtio-scsi-test tests/virtio-9p-test
tests/virtio-serial-test tests/virtio-console-test tests/tpci200-test
tests/ipoctal232-test tests/display-vga-test tests/intel-hda-test
tests/vmxnet3-test tests/pvpanic-test tests/i82801b11-test
tests/ioh3420-test tests/usb-hcd-ohci-test tests/usb-hcd-uhci-test
tests/usb-hcd-ehci-test tests/usb-hcd-xhci-test tests/vhost-user-test
tests/qom-test
petmay01 31107  0.0  0.0  79500  1060 pts/13   Sl+  17:10   0:00
               \_ gtester -k -q -m=quick tests/endianness-test
tests/fdc-test tests/ide-test tests/hd-geo-test tests/boot-order-test
tests/bios-tables-test tests/rtc-test tests/i440fx-test
tests/fw_cfg-test tests/blockdev-test tests/qdev-monitor-test
tests/wdt_ib700-test tests/e1000-test tests/rtl8139-test
tests/pcnet-test tests/eepro100-test tests/ne2000-test tests/nvme-test
tests/ac97-test tests/es1370-test tests/virtio-net-test
tests/virtio-balloon-test tests/virtio-blk-test tests/virtio-rng-test
tests/virtio-scsi-test tests/virtio-9p-test tests/virtio-serial-test
tests/virtio-console-test tests/tpci200-test tests/ipoctal232-test
tests/display-vga-test tests/intel-hda-test tests/vmxnet3-test
tests/pvpanic-test tests/i82801b11-test tests/ioh3420-test
tests/usb-hcd-ohci-test tests/usb-hcd-uhci-test
tests/usb-hcd-ehci-test tests/usb-hcd-xhci-test tests/vhost-user-test
tests/qom-test
petmay01 31173  0.0  0.0 230280  6308 pts/13   S+   17:10   0:00
                   \_ tests/boot-order-test --quiet --keep-going
-m=quick --GTestLogFD=6
petmay01 31178  0.0  0.2 529580 39536 pts/13   Sl+  17:10   0:00
                       \_ i386-softmmu/qemu-system-i386 -qtest
unix:/tmp/qtest-31173.sock,nowait -qtest-log /dev/null -qmp
unix:/tmp/qtest-31173.qmp,nowait -machine accel=qtest -display none
-nodefaults -boot c


we see the "boot-order-test" has got stuck. Backtrace of
the boot-order-test binary:

(gdb) bt
#0  0x00002b566029ee9c in __libc_waitpid (pid=31178, stat_loc=0x0,
options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:31
#1  0x00002b5659d27fbf in kill_qemu (s=<optimised out>) at
/home/petmay01/linaro/qemu-for-merges/tests/libqtest.c:109
#2  0x00002b5659d27de5 in qtest_quit (s=0x2b5659da4e00) at
/home/petmay01/linaro/qemu-for-merges/tests/libqtest.c:219
#3  0x00002b5659d24154 in test_a_boot_order (test_args=<optimised
out>, read_boot_order=0x0, expected_boot=<optimised out>,
    expected_reboot=<optimised out>, machine=<optimised out>) at
/home/petmay01/linaro/qemu-for-merges/tests/boot-order-test.c:52
#4  test_boot_orders (machine=<optimised out>, read_boot_order=0x0,
tests=0x2b5659f51310 <test_cases_pc>)
    at /home/petmay01/linaro/qemu-for-merges/tests/boot-order-test.c:63
#5  0x00002b5659d23da3 in test_pc_boot_order () at
/home/petmay01/linaro/qemu-for-merges/tests/boot-order-test.c:112
#6  0x00002b565b689a91 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#7  0x00002b565b689c56 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#8  0x00002b565b689c56 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#9  0x00002b565b689fab in g_test_run_suite () from
/lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00002b565b689fe1 in g_test_run () from
/lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00002b5659d23cb6 in main (argc=1, argv=0x7fffdfd7dee8) at
/home/petmay01/linaro/qemu-for-merges/tests/boot-order-test.c:208

Backtrace of qemu binary:

(gdb) info thread
  Id   Target Id         Frame
  2    Thread 0x2b13e7360700 (LWP 31179) "qemu-system-i38"
0x00002b13db8a90d1 in do_sigwait (sig=0x2b13e735fb64, set=<optimised
out>)
    at 
../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:60
* 1    Thread 0x2b13d05bc480 (LWP 31178) "qemu-system-i38"
0x00002b13dc7d107f in __GI_ppoll (fds=0x2b13d06a89f0, nfds=5,
    timeout=<optimised out>, sigmask=0x0) at
../sysdeps/unix/sysv/linux/ppoll.c:56
(gdb) bt
#0  0x00002b13dc7d107f in __GI_ppoll (fds=0x2b13d06a89f0, nfds=5,
timeout=<optimised out>, sigmask=0x0)
    at ../sysdeps/unix/sysv/linux/ppoll.c:56
#1  0x00002b13d01daf8e in qemu_poll_ns (fds=<optimised out>,
nfds=<optimised out>, timeout=<optimised out>)
    at /home/petmay01/linaro/qemu-for-merges/qemu-timer.c:319
#2  0x00002b13d01d7010 in os_host_main_loop_wait (timeout=<optimised
out>) at /home/petmay01/linaro/qemu-for-merges/main-loop.c:229
#3  main_loop_wait (nonblocking=<optimised out>) at
/home/petmay01/linaro/qemu-for-merges/main-loop.c:484
#4  0x00002b13cfb9166a in main_loop () at
/home/petmay01/linaro/qemu-for-merges/vl.c:2032
#5  main (argc=<optimised out>, argv=<optimised out>,
envp=0x7fff0acf5d70) at
/home/petmay01/linaro/qemu-for-merges/vl.c:4573
(gdb) thread 2
[Switching to thread 2 (Thread 0x2b13e7360700 (LWP 31179))]
#0  0x00002b13db8a90d1 in do_sigwait (sig=0x2b13e735fb64, set=<optimised out>)
    at 
../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:60
60    
../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:
No such file or directory.
(gdb) bt
#0  0x00002b13db8a90d1 in do_sigwait (sig=0x2b13e735fb64, set=<optimised out>)
    at 
../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:60
#1  __sigwait (set=0x2b13e735fb68, sig=0x2b13e735fb64) at
../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:97
#2  0x00002b13cf7c1b3b in qemu_dummy_cpu_thread_fn (arg=<optimised
out>) at /home/petmay01/linaro/qemu-for-merges/cpus.c:982
#3  0x00002b13db8a1182 in start_thread (arg=0x2b13e7360700) at
pthread_create.c:312
#4  0x00002b13dc7de30d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111

So it looks like QEMU has somehow failed to respond to
the SIGTERM that the test wrapper sends. If you manually
kill the QEMU binary then (naturally) the 'make check'
run continues happily.

Looking at variables in the qemu process indicates we
did take the sigterm:
(gdb) print shutdown_signal
$3 = 15
(gdb) print shutdown_pid
$4 = 31173

In the poll we're polling on timeout:
{tv_sec = 0x6dfd, tv_nsec = 0x3b8ad410}
and fds:
(gdb) print fds[0]
$8 = {fd = 3, events = 25, revents = 0}
(gdb) print fds[1]
$9 = {fd = 5, events = 1, revents = 0}
(gdb) print fds[2]
$10 = {fd = 7, events = 1, revents = 0}
(gdb) print fds[3]
$11 = {fd = 8, events = 1, revents = 0}
(gdb) print fds[4]
$12 = {fd = 4, events = 1, revents = 0}

The aio context which should have got its notifier
prodded when we did the shutdown request:

(gdb) print *qemu_aio_context
$5 = {source = {callback_data = 0x0, callback_funcs = 0x0,
source_funcs = 0x2b13d0fe61b8 <aio_source_funcs>, ref_count = 2,
    context = 0x2b13d0618070, priority = 0, flags = 1, source_id = 1,
poll_fds = 0x2b13d06069c0, prev = 0x0, next = 0x2b13d06251d0,
    name = 0x0, priv = 0x2b13d0617800}, lock = {lock = {lock = {__data
= {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 2,
          __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}},
        __size = '\000' <repeats 16 times>, "\002", '\000' <repeats 22
times>, __align = 0}}, head = 0, tail = 0, cond = {cond = {__data = {
          __lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0,
__woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0},
        __size = '\000' <repeats 47 times>, __align = 0}},
owner_thread = {thread = 0}, nesting = 0, cb = 0x2b13d019ff10
<aio_rfifolock_cb>,
    cb_opaque = 0x2b13d0617600}, aio_handlers = {lh_first =
0x2b13d0617760}, walking_handlers = 0, dispatching = false, bh_lock =
{lock = {
      __data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0,
__kind = 2, __spins = 0, __elision = 0, __list = {__prev = 0x0,
          __next = 0x0}}, __size = '\000' <repeats 16 times>, "\002",
'\000' <repeats 22 times>, __align = 0}}, first_bh = 0x2b13d066c2f0,
  walking_bh = 0, notifier = {rfd = 4, wfd = 4}, pollfds =
0x2b13d0616e30, thread_pool = 0x0, tlg = {tl = {0x2b13d0616cc0,
0x2b13d0616d30,
      0x2b13d0618000}}}

So we should have prodded a filedescriptor which
the poll is listening on (and indeed sending another
SIGTERM does result in the poll waking up and the
qemu process exiting).

It looks like we have a race condition of some kind :-(

Any suggestions for:
 (a) why QEMU might have ignored this SIGTERM
 (b) whether libqtest should be doing something more
     than just sending one SIGTERM (SIGKILL?)

thanks
-- PMM



reply via email to

[Prev in Thread] Current Thread [Next in Thread]