[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event lo
From: |
Rafael David Tinoco |
Subject: |
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images |
Date: |
Tue, 10 Sep 2019 22:56:25 -0000 |
QEMU BUG: #1
Alright, one of the issues is (according to comment #14):
"""
Meaning that code is waiting for a futex inside kernel.
(gdb) print rcu_call_ready_event
$4 = {value = 4294967295, initialized = true}
The QemuEvent "rcu_call_ready_event->value" is set to INT_MAX and I
don't know why yet.
rcu_call_ready_event->value is only touched by:
qemu_event_init() -> bool init ? EV_SET : EV_FREE
qemu_event_reset() -> atomic_or(&ev->value, EV_FREE)
qemu_event_set() -> atomic_xchg(&ev->value, EV_SET)
qemu_event_wait() -> atomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY)'
"""
Now I know why rcu_call_ready_event->value is set to INT_MAX. That is
because in the following declaration:
struct QemuEvent {
#ifndef __linux__
pthread_mutex_t lock;
pthread_cond_t cond;
#endif
unsigned value;
bool initialized;
};
#define EV_SET 0
#define EV_FREE 1
#define EV_BUSY -1
"value" is declared as unsigned, but EV_BUSY sets it to -1, and,
according to the Two's Complement Operation
(https://en.wikipedia.org/wiki/Two%27s_complement), it will be INT_MAX
(4294967295).
So this is the "first bug" found AND it is definitely funny that this
hasn't been seen in other architectures at all... I can reproduce it at
will.
With that said, it seems that there is still another issue causing (less
frequently):
(gdb) thread 2
[Switching to thread 2 (Thread 0xffffbec5ad90 (LWP 17459))]
#0 syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38
38 ../sysdeps/unix/sysv/linux/aarch64/syscall.S: No such file or directory.
(gdb) bt
#0 syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38
#1 0x0000aaaaaabd41cc in qemu_futex_wait (val=<optimized out>, f=<optimized
out>) at ./util/qemu-thread-posix.c:438
#2 qemu_event_wait (ev=ev@entry=0xaaaaaac86ce8 <rcu_call_ready_event>) at
./util/qemu-thread-posix.c:442
#3 0x0000aaaaaabed05c in call_rcu_thread (opaque=opaque@entry=0x0) at
./util/rcu.c:261
#4 0x0000aaaaaabd34c8 in qemu_thread_start (args=<optimized out>) at
./util/qemu-thread-posix.c:498
#5 0x0000ffffbf25c880 in start_thread (arg=0xfffffffff5bf) at
pthread_create.c:486
#6 0x0000ffffbf1b6b9c in thread_start () at
../sysdeps/unix/sysv/linux/aarch64/clone.S:78
Thread 2 to be stuck at "futex()" kernel syscall (like the FUTEX_WAKE
never happened and/or wasn't atomic for this arch/binary). Need to
investigate this also.
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256
Title:
qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
converting images
Status in QEMU:
In Progress
Status in qemu package in Ubuntu:
In Progress
Bug description:
On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
frequently hangs (~50% of the time) with this command:
qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2
Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
qcow2->qcow2 conversion happens to be something uvtool does every time
it fetches images.
Once hung, attaching gdb gives the following backtrace:
(gdb) bt
#0 0x0000ffffae4f8154 in __GI_ppoll (fds=0xaaaae8a67dc0,
nfds=187650274213760,
timeout=<optimized out>, timeout@entry=0x0, sigmask=0xffffc123b950)
at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1 0x0000aaaabbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized
out>,
__fds=<optimized out>) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3 0x0000aaaabbefbf80 in os_host_main_loop_wait (timeout=-1)
at util/main-loop.c:233
#4 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497
#5 0x0000aaaabbe2aa30 in convert_do_copy (s=0xffffc123bb58) at
qemu-img.c:1980
#6 img_convert (argc=<optimized out>, argv=<optimized out>) at
qemu-img.c:2456
#7 0x0000aaaabbe2333c in main (argc=7, argv=<optimized out>) at
qemu-img.c:4975
Reproduced w/ latest QEMU git (@ 53744e0a182)
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions