[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM sys
From: |
Rafael David Tinoco |
Subject: |
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system |
Date: |
Tue, 10 Sep 2019 02:04:59 -0000 |
Alright,
I'm still investigating this but wanted to share some findings... I
haven't got a kernel dump yet after the task is frozen, I have analyzed
only the userland part of it (although I have checked if code was
running inside kernel with perf cycles:u/cycles:k at some point).
The big picture is this: Whenever qemu-img hangs, we have 3 hung tasks
basically with these stacks:
----
TRHREAD #1
__GI_ppoll (../sysdeps/unix/sysv/linux/ppoll.c:39)
ppoll (/usr/include/aarch64-linux-gnu/bits/poll2.h:77)
qemu_poll_ns (./util/qemu-timer.c:322)
os_host_main_loop_wait (./util/main-loop.c:233)
main_loop_wait (./util/main-loop.c:497)
convert_do_copy (./qemu-img.c:1981)
img_convert (./qemu-img.c:2457)
main (./qemu-img.c:4976)
got stack traces:
./33293/stack ./33293/stack
[<0>] __switch_to+0xc0/0x218 [<0>] __switch_to+0xc0/0x218
[<0>] ptrace_stop+0x148/0x2b0 [<0>] do_sys_poll+0x508/0x5c0
[<0>] get_signal+0x5a4/0x730 [<0>] __arm64_sys_ppoll+0xc0/0x118
[<0>] do_notify_resume+0x158/0x358 [<0>] el0_svc_common+0xa0/0x168
[<0>] work_pending+0x8/0x10 [<0>] el0_svc_handler+0x38/0x78
[<0>] el0_svc+0x8/0xc
root@d06-1:~$ perf record -F 9999 -e cycles:u -p 33293 -- sleep 10
[ perf record: Woken up 6 times to write data ]
[ perf record: Captured and wrote 1.871 MB perf.data (48730 samples) ]
root@d06-1:~$ perf report --stdio
# Overhead Command Shared Object Symbol
# ........ ........ .................. ......................
#
37.82% qemu-img libc-2.29.so [.] 0x00000000000df710
21.81% qemu-img [unknown] [k] 0xffff000010099504
14.23% qemu-img [unknown] [k] 0xffff000010085dc0
9.13% qemu-img [unknown] [k] 0xffff00001008fff8
6.47% qemu-img libc-2.29.so [.] 0x00000000000df708
5.69% qemu-img qemu-img [.] qemu_event_reset
2.57% qemu-img libc-2.29.so [.] 0x00000000000df678
0.63% qemu-img libc-2.29.so [.] 0x00000000000df700
0.49% qemu-img libc-2.29.so [.] __sigtimedwait
0.42% qemu-img libpthread-2.29.so [.] __libc_sigwait
----
TRHREAD #3
__GI___sigtimedwait (../sysdeps/unix/sysv/linux/sigtimedwait.c:29)
__sigwait (linux/sigwait.c:28)
qemu_thread_start (./util/qemu-thread-posix.c:498)
start_thread (pthread_create.c:486)
thread_start (linux/aarch64/clone.S:78)
./33303/stack ./33303/stack
[<0>] __switch_to+0xc0/0x218 [<0>] __switch_to+0xc0/0x218
[<0>] ptrace_stop+0x148/0x2b0 [<0>] do_sigtimedwait.isra.9+0x194/0x288
[<0>] get_signal+0x5a4/0x730 [<0>]
__arm64_sys_rt_sigtimedwait+0xac/0x110
[<0>] do_notify_resume+0x158/0x358 [<0>] el0_svc_common+0xa0/0x168
[<0>] work_pending+0x8/0x10 [<0>] el0_svc_handler+0x38/0x78
[<0>] el0_svc+0x8/0xc
root@d06-1:~$ perf record -F 9999 -e cycles:u -p 33303 -- sleep 10
[ perf record: Woken up 6 times to write data ]
[ perf record: Captured and wrote 1.905 MB perf.data (49647 samples) ]
root@d06-1:~$ perf report --stdio
# Overhead Command Shared Object Symbol
# ........ ........ .................. ......................
#
45.37% qemu-img libc-2.29.so [.] 0x00000000000df710
23.52% qemu-img [unknown] [k] 0xffff000010099504
9.08% qemu-img [unknown] [k] 0xffff00001008fff8
8.89% qemu-img [unknown] [k] 0xffff000010085dc0
5.56% qemu-img libc-2.29.so [.] 0x00000000000df708
3.66% qemu-img libc-2.29.so [.] 0x00000000000df678
1.01% qemu-img libc-2.29.so [.] __sigtimedwait
0.80% qemu-img libc-2.29.so [.] 0x00000000000df700
0.64% qemu-img qemu-img [.] qemu_event_reset
0.55% qemu-img libc-2.29.so [.] 0x00000000000df718
0.52% qemu-img libpthread-2.29.so [.] __libc_sigwait
----
TRHREAD #2
syscall (linux/aarch64/syscall.S:38)
qemu_futex_wait (./util/qemu-thread-posix.c:438)
qemu_event_wait (./util/qemu-thread-posix.c:442)
call_rcu_thread (./util/rcu.c:261)
qemu_thread_start (./util/qemu-thread-posix.c:498)
start_thread (pthread_create.c:486)
thread_start (linux/aarch64/clone.S:78)
./33302/stack ./33302/stack
[<0>] __switch_to+0xc0/0x218 [<0>] __switch_to+0xc0/0x218
[<0>] ptrace_stop+0x148/0x2b0 [<0>] ptrace_stop+0x148/0x2b0
[<0>] get_signal+0x5a4/0x730 [<0>] get_signal+0x5a4/0x730
[<0>] do_notify_resume+0x1c4/0x358 [<0>] do_notify_resume+0x1c4/0x358
[<0>] work_pending+0x8/0x10 [<0>] work_pending+0x8/0x10
<stack does not change at all>
root@d06-1:~$ perf report --stdio
# Overhead Command Shared Object Symbol
# ........ ........ .................. ......................
#
50.30% qemu-img libc-2.29.so [.] 0x00000000000df710
26.44% qemu-img [unknown] [k] 0xffff000010099504
5.88% qemu-img libc-2.29.so [.] 0x00000000000df708
5.26% qemu-img [unknown] [k] 0xffff000010085dc0
5.25% qemu-img [unknown] [k] 0xffff00001008fff8
4.25% qemu-img libc-2.29.so [.] 0x00000000000df678
0.93% qemu-img libc-2.29.so [.] __sigtimedwait
0.51% qemu-img libc-2.29.so [.] 0x00000000000df700
0.35% qemu-img libpthread-2.29.so [.] __libc_sigwait
Their stack show those tasks are pretty much "stuck" in same userland
program logic, while one of them is stuck at the same program counter
address. Profiling those tasks give no much information without more
debugging data and less optimizations.
Although all the 0x000000dfXXX addresses seem broken as we get where
libc was mapped (mid heap) and we have:
(gdb) print __libc_sigwait
$25 = {int (const sigset_t *, int *)} 0xffffbf128080 <__GI___sigwait>
----
Anyway, continuing.... I investigated the qemu_event_{set,reset,xxx}
logic. In non Linux OSes it uses pthread primitives, but, for Linux, it
uses a futex() implementation with a struct QemuEvent
(rcu_call_ready_event) being the one holding values (busy, set, free,
etc).
I got 2 hung situations:
(gdb) print (struct QemuEvent) *(0xaaaacd35fce8)
$16 = {
value = 4294967295,
initialized = true
}
value = 4294967295 -> THIS IS A 32-bit 0xFFFF (casting vs overflow issue
?)
AND
a situation where value was either 0 or 1 (like expected). In this last
situation I changed things by hand to make program to continue its
execution:
void qemu_event_wait(QemuEvent *ev)
{
unsigned value;
assert(ev->initialized);
value = atomic_read(&ev->value);
smp_mb_acquire();
if (value != EV_SET) {
if (value == EV_FREE) {
if (atomic_cmpxchg(&ev->value,
EV_FREE, EV_BUSY) == EV_SET) {
return;
}
}
qemu_futex_wait(ev, EV_BUSY);
}
}
438 in ./util/qemu-thread-posix.c
0x0000aaaaaabd4174 <+44>: mov w1, #0xffffffff // #-1
0x0000aaaaaabd4178 <+48>: ldaxr w0, [x19]
0x0000aaaaaabd417c <+52>: cmp w0, #0x1
0x0000aaaaaabd4180 <+56>: b.ne 0xaaaaaabd418c <qemu_event_wait+68> //
b.any
=> 0x0000aaaaaabd4184 <+60>: stlxr w2, w1, [x19]
0x0000aaaaaabd4188 <+64>: cbnz w2, 0xaaaaaabd4178 <qemu_event_wait+48>
0x0000aaaaaabd418c <+68>: cbz w0, 0xaaaaaabd41cc <qemu_event_wait+132>
0x0000aaaaaabd4190 <+72>: mov w6, #0x0 // #0
0x0000aaaaaabd4194 <+76>: mov x5, #0x0 // #0
0x0000aaaaaabd4198 <+80>: mov x4, #0x0 // #0
0x0000aaaaaabd419c <+84>: mov w3, #0xffffffff // #-1
0x0000aaaaaabd41a0 <+88>: mov w2, #0x0 // #0
0x0000aaaaaabd41a4 <+92>: mov x1, x19
0x0000aaaaaabd41a8 <+96>: mov x0, #0x62 // #98
0x0000aaaaaabd41ac <+100>: bl 0xaaaaaaaff380 <syscall@plt>
I unblocked it by hand, setting the program counter register to outside that
logic:
(gdb) print qemu_event_wait+132
$15 = (void (*)(QemuEvent *)) 0xaaaaaabd41cc <qemu_event_wait+132>
(gdb) print rcu_call_ready_event
$16 = {value = 1, initialized = true}
(gdb) set rcu_call_ready_event->value=0
(gdb) set $pc=0xaaaaaabd41cc
And it got stuck again with program counter in other STLXR instruction:
(gdb) thread 2
[Switching to thread 2 (Thread 0xffffbec61d90 (LWP 33302))]
#0 0x0000aaaaaabd4110 in qemu_event_reset (ev=0xaaaaaac87ce8
<rcu_call_ready_event>) at ./util/qemu-thread-posix.c:414
414 ./util/qemu-thread-posix.c: No such file or directory.
(gdb) bt
#0 0x0000aaaaaabd4110 in qemu_event_reset (ev=0xaaaaaac87ce8
<rcu_call_ready_event>) at ./util/qemu-thread-posix.c:414
#1 0x0000aaaaaabedff8 in call_rcu_thread (opaque=opaque@entry=0x0) at
./util/rcu.c:255
#2 0x0000aaaaaabd34c8 in qemu_thread_start (args=<optimized out>) at
./util/qemu-thread-posix.c:498
#3 0x0000ffffbf26a880 in start_thread (arg=0xfffffffff5bf) at
pthread_create.c:486
#4 0x0000ffffbf1c4b9c in thread_start () at
../sysdeps/unix/sysv/linux/aarch64/clone.S:78
(gdb) print rcu_call_ready_event
$20 = {value = 1, initialized = true}
(gdb) disassemble qemu_event_reset
Dump of assembler code for function qemu_event_reset:
0x0000aaaaaabd40f0 <+0>: ldrb w1, [x0, #4]
0x0000aaaaaabd40f4 <+4>: cbz w1, 0xaaaaaabd411c <qemu_event_reset+44>
0x0000aaaaaabd40f8 <+8>: ldr w1, [x0]
0x0000aaaaaabd40fc <+12>: dmb ishld
0x0000aaaaaabd4100 <+16>: cbz w1, 0xaaaaaabd4108 <qemu_event_reset+24>
0x0000aaaaaabd4104 <+20>: ret
0x0000aaaaaabd4108 <+24>: ldaxr w1, [x0]
0x0000aaaaaabd410c <+28>: orr w1, w1, #0x1
=> 0x0000aaaaaabd4110 <+32>: stlxr w2, w1, [x0]
0x0000aaaaaabd4114 <+36>: cbnz w2, 0xaaaaaabd4108 <qemu_event_reset+24>
0x0000aaaaaabd4118 <+40>: ret
And it does not matter if I continue, CPU keeps stuck in that program counter
(again in a STLXR instruction)
----
So, initially I was afraid that the lack barriers (or not so strong ones
being used) could have caused a race condition that would make one
thread to depend on the other thread logic.
Unfortunately it looks that instruction STLXR might not be behaving
appropriately for this CPU/architecture as program counter seem to be
stuck in the same instruction (which is super weird, by not throwing a
general exception for some microcode issue, for example).
But this was just an initial overview, I still have to revisit this in
order interpret results better (and recompile qemu with debugging data,
and possible with other GCC version).
Any comments are appreciated.
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256
Title:
qemu-img hangs on high core count ARM system
Status in QEMU:
Confirmed
Status in qemu package in Ubuntu:
In Progress
Bug description:
On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
frequently hangs (~50% of the time) with this command:
qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2
Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
qcow2->qcow2 conversion happens to be something uvtool does every time
it fetches images.
Once hung, attaching gdb gives the following backtrace:
(gdb) bt
#0 0x0000ffffae4f8154 in __GI_ppoll (fds=0xaaaae8a67dc0,
nfds=187650274213760,
timeout=<optimized out>, timeout@entry=0x0, sigmask=0xffffc123b950)
at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1 0x0000aaaabbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized
out>,
__fds=<optimized out>) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3 0x0000aaaabbefbf80 in os_host_main_loop_wait (timeout=-1)
at util/main-loop.c:233
#4 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497
#5 0x0000aaaabbe2aa30 in convert_do_copy (s=0xffffc123bb58) at
qemu-img.c:1980
#6 img_convert (argc=<optimized out>, argv=<optimized out>) at
qemu-img.c:2456
#7 0x0000aaaabbe2333c in main (argc=7, argv=<optimized out>) at
qemu-img.c:4975
Reproduced w/ latest QEMU git (@ 53744e0a182)
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
- [Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system, Rafael David Tinoco, 2019/09/05
- [Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system, Rafael David Tinoco, 2019/09/06
- [Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system, Rafael David Tinoco, 2019/09/06
- [Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system, Rafael David Tinoco, 2019/09/06
- [Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system, Rafael David Tinoco, 2019/09/09
- [Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system,
Rafael David Tinoco <=
- [Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system, Rafael David Tinoco, 2019/09/10