qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues


From: dann frazier
Subject: Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues
Date: Mon, 7 Oct 2019 08:44:32 -0600
User-agent: Mutt/1.10.1 (2018-07-13)

On Mon, Oct 07, 2019 at 01:06:20PM +0200, Paolo Bonzini wrote:
> On 02/10/19 11:23, Jan Glauber wrote:
> > I've looked into this on ThunderX2. The arm64 code generated for the
> > atomic_[add|sub] accesses of ctx->notify_me doesn't contain any
> > memory barriers. It is just plain ldaxr/stlxr.
> > 
> > From my understanding this is not sufficient for SMP sync.
> > 
> > If I read this comment correct:
> > 
> >     void aio_notify(AioContext *ctx)
> >     {
> >         /* Write e.g. bh->scheduled before reading ctx->notify_me.  Pairs
> >          * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
> >          */
> >         smp_mb();
> >         if (ctx->notify_me) {
> > 
> > it points out that the smp_mb() should be paired. But as
> > I said the used atomics don't generate any barriers at all.
> 
> Based on the rest of the thread, this patch should also fix the bug:
> 
> diff --git a/util/async.c b/util/async.c
> index 47dcbfa..721ea53 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -249,7 +249,7 @@ aio_ctx_check(GSource *source)
>      aio_notify_accept(ctx);
>  
>      for (bh = ctx->first_bh; bh; bh = bh->next) {
> -        if (bh->scheduled) {
> +        if (atomic_mb_read(&bh->scheduled)) {
>              return true;
>          }
>      }
> 
> 
> And also the memory barrier in aio_notify can actually be replaced
> with a SEQ_CST load:
> 
> diff --git a/util/async.c b/util/async.c
> index 47dcbfa..721ea53 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -349,11 +349,11 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx)
>  
>  void aio_notify(AioContext *ctx)
>  {
> -    /* Write e.g. bh->scheduled before reading ctx->notify_me.  Pairs
> -     * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
> +    /* Using atomic_mb_read ensures that e.g. bh->scheduled is written before
> +     * ctx->notify_me is read.  Pairs with atomic_or in aio_ctx_prepare or
> +     * atomic_add in aio_poll.
>       */
> -    smp_mb();
> -    if (ctx->notify_me) {
> +    if (atomic_mb_read(&ctx->notify_me)) {
>          event_notifier_set(&ctx->notifier);
>          atomic_mb_set(&ctx->notified, true);
>      }
> 
> 
> Would you be able to test these (one by one possibly)?

Paolo,
  I tried them both separately and together on a Hi1620 system, each
time it hung in the first iteration. Here's a backtrace of a run with
both patches applied:

(gdb) thread apply all bt

Thread 3 (Thread 0xffff8154b820 (LWP 63900)):
#0  0x0000ffff8b9402cc in __GI___sigtimedwait (set=<optimized out>, 
set@entry=0xaaaaf1e08070, 
    info=info@entry=0xffff8154ad98, timeout=timeout@entry=0x0) at 
../sysdeps/unix/sysv/linux/sigtimedwait.c:42
#1  0x0000ffff8ba77fac in __sigwait (set=set@entry=0xaaaaf1e08070, 
sig=sig@entry=0xffff8154ae74)
    at ../sysdeps/unix/sysv/linux/sigwait.c:28
#2  0x0000aaaab7dc1610 in sigwait_compat (opaque=0xaaaaf1e08070) at 
util/compatfd.c:35
#3  0x0000aaaab7dc3e80 in qemu_thread_start (args=<optimized out>) at 
util/qemu-thread-posix.c:519
#4  0x0000ffff8ba6d088 in start_thread (arg=0xffffceefbf4f) at 
pthread_create.c:463
#5  0x0000ffff8b9dd4ec in thread_start () at 
../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 2 (Thread 0xffff81d4c820 (LWP 63899)):
#0  syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38
#1  0x0000aaaab7dc4cd8 in qemu_futex_wait (val=<optimized out>, f=<optimized 
out>)
    at /home/ubuntu/qemu/include/qemu/futex.h:29
#2  qemu_event_wait (ev=ev@entry=0xaaaab7e48708 <rcu_call_ready_event>) at 
util/qemu-thread-posix.c:459
#3  0x0000aaaab7ddf44c in call_rcu_thread (opaque=<optimized out>) at 
util/rcu.c:260
#4  0x0000aaaab7dc3e80 in qemu_thread_start (args=<optimized out>) at 
util/qemu-thread-posix.c:519
#5  0x0000ffff8ba6d088 in start_thread (arg=0xffffceefc05f) at 
pthread_create.c:463
#6  0x0000ffff8b9dd4ec in thread_start () at 
../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 1 (Thread 0xffff81e83010 (LWP 63898)):
#0  0x0000ffff8b9d4154 in __GI_ppoll (fds=0xaaaaf1e0dbc0, nfds=187650205809964, 
timeout=<optimized out>, 
    timeout@entry=0x0, sigmask=0xffffceefbef0) at 
../sysdeps/unix/sysv/linux/ppoll.c:39
#1  0x0000aaaab7dbedb0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized 
out>, __fds=<optimized out>)
    at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
#2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, 
timeout=timeout@entry=-1) at util/qemu-timer.c:340
#3  0x0000aaaab7dbfd2c in os_host_main_loop_wait (timeout=-1) at 
util/main-loop.c:236
#4  main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:517
#5  0x0000aaaab7ce86e8 in convert_do_copy (s=0xffffceefc068) at qemu-img.c:2028
#6  img_convert (argc=<optimized out>, argv=<optimized out>) at qemu-img.c:2520
#7  0x0000aaaab7ce1e54 in main (argc=8, argv=<optimized out>) at qemu-img.c:5097

> > I've tried to verify me theory with this patch and didn't run into the
> > issue for ~500 iterations (usually I would trigger the issue ~20 
> > iterations).
> 
> Sorry for asking the obvious---500 iterations of what?

$ for i in $(seq 1 500); do echo "==$i=="; ./qemu/qemu-img convert -p -f qcow2 
-O qcow2 bionic-server-cloudimg-arm64.img out.img; done
==1==
    (37.19/100%)

  -dann



reply via email to

[Prev in Thread] Current Thread [Next in Thread]