Re: [Qemu-arm] qemu_futex_wait() lockups in ARM64: 2 possible issues

qemu-arm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-arm] qemu_futex_wait() lockups in ARM64: 2 possible issues

From:	Paolo Bonzini
Subject:	Re: [Qemu-arm] qemu_futex_wait() lockups in ARM64: 2 possible issues
Date:	Wed, 11 Sep 2019 15:17:20 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0

Note that the RCU thread is expected to sit most of the time doing 
nothing, so I don't think this matters.

Zhengui's theory that notify_me doesn't work properly on ARM is more
promising, but he couldn't provide a clear explanation of why he thought
notify_me is involved.  In particular, I would have expected notify_me to
be wrong if the qemu_poll_ns call came from aio_ctx_dispatch, for example:


    glib_pollfds_fill
      g_main_context_prepare
        aio_ctx_prepare
          atomic_or(&ctx->notify_me, 1)
    qemu_poll_ns
    glib_pollfds_poll
      g_main_context_check
        aio_ctx_check
          atomic_and(&ctx->notify_me, ~1)
      g_main_context_dispatch
        aio_ctx_dispatch
          /* do something for event */
            qemu_poll_ns 

but all backtraces show thread 1 in os_host_main_loop_wait:

Thread 1 (Thread 0x40000b573370 (LWP 27214)):
#0  0x000040000a489020 in ppoll () from /lib64/libc.so.6
#1  0x0000aaaaaadaefc0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized 
out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized 
out>) at qemu_timer.c:391
#3  0x0000aaaaaadae014 in os_host_main_loop_wait (timeout=<optimized out>) at 
main_loop.c:272
#4  0x0000aaaaaadae190 in main_loop_wait (nonblocking=<optimized out>) at 
main_loop.c:534
#5  0x0000aaaaaad97be0 in convert_do_copy (s=0xffffdc32eb48) at qemu-img.c:1923
#6  0x0000aaaaaada2d70 in img_convert (argc=<optimized out>, argv=<optimized 
out>) at qemu-img.c:2414
#7  0x0000aaaaaad99ac4 in main (argc=7, argv=<optimized out>) at qemu-img.c:5305

Can you place somewhere your util/async.o object file for me to look at it?

Anyway:

On 11/09/19 04:15, Rafael David Tinoco wrote:
> I've caught the following stack trace after an HUNG in qemu-img convert:
> 
> (gdb) bt
> #0 syscall ()
> #1 0x0000aaaaaabd41cc in qemu_futex_wait
> #2 qemu_event_wait (ev=ev@entry=0xaaaaaac86ce8 <rcu_call_ready_event>)
> #3 0x0000aaaaaabed05c in call_rcu_thread
> #4 0x0000aaaaaabd34c8 in qemu_thread_start
> #5 0x0000ffffbf25c880 in start_thread
> #6 0x0000ffffbf1b6b9c in thread_start ()
> 
> (gdb) print rcu_call_ready_event
> $4 = {value = 4294967295, initialized = true}
> 
> value INT_MAX (4294967295) seems WRONG for qemu_futex_wait():

This is UINT_MAX, not INT_MAX.  qemu_futex_wait() doesn't care of the 
signedness of the value, which is why it is declared as void *.  (That said,
changing "ev" to "&ev->value" would be nicer indeed).

> - EV_BUSY, being -1, and passed as an argument qemu_futex_wait(void *,
> unsigned), is a two's complement, making argument into a INT_MAX when
> that's not what is expected (unless I missed something).
> 
> *** If that is the case, unsure if you, Paolo, prefer declaring
> *(QemuEvent)->value as an integer or changing EV_BUSY to "2" would okay
> here ***

You could change it to 3, but it has to have all the bits in EV_FREE 
(see atomic_or(&ev->value, EV_FREE) in qemu_event_reset).

You could also change it to -1u, but I don't see a particular need to do so.

> BUG: description:
> https://bugs.launchpad.net/qemu/+bug/1805256/comments/15
> 
> ========
> ISSUE #2
> ========
> 
> I found this when debugging lockups while in futex() in a specific ARM64
> server - https://bugs.launchpad.net/qemu/+bug/1805256 - which I'm still
> investigating.
> 
> After fixing the issue above, I'm still getting stuck into:

If you changed it to 2, it's wrong.

> - Should qemu_event_set() check return code from
> qemu_futex_wake()->qemu_futex()->syscall() in order to know if ANY
> waiter was ever woken up ? Maybe even loop until at least 1 is awaken ?

Why would it need to do so?

Paolo

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-arm] qemu_futex_wait() lockups in ARM64: 2 possible issues, Rafael David Tinoco, 2019/09/10
- Re: [Qemu-arm] qemu_futex_wait() lockups in ARM64: 2 possible issues, Rafael David Tinoco, 2019/09/11
- Re: [Qemu-arm] qemu_futex_wait() lockups in ARM64: 2 possible issues, Paolo Bonzini <=
  - Re: [Qemu-arm] qemu_futex_wait() lockups in ARM64: 2 possible issues, Rafael David Tinoco, 2019/09/11
  - Re: [Qemu-arm] qemu_futex_wait() lockups in ARM64: 2 possible issues, Rafael David Tinoco, 2019/09/11
    - Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues, dann frazier, 2019/09/24

Prev by Date: Re: [Qemu-arm] [PATCH v4 0/4] semihosting at translate time fixes
Next by Date: Re: [Qemu-arm] [Qemu-devel] [PATCH-for-4.2 v10 08/11] hw/arm: Use GED for system_powerdown event
Previous by thread: Re: [Qemu-arm] qemu_futex_wait() lockups in ARM64: 2 possible issues
Next by thread: Re: [Qemu-arm] qemu_futex_wait() lockups in ARM64: 2 possible issues
Index(es):
- Date
- Thread