qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] AioContext: fix broken ctx->dispatching optimiz


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH] AioContext: fix broken ctx->dispatching optimization
Date: Wed, 15 Jul 2015 22:14:35 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Am 15.07.2015 um 19:13 hat Paolo Bonzini geschrieben:
> This patch rewrites the ctx->dispatching optimization, which was the cause
> of some mysterious hangs that could be reproduced on aarch64 KVM only.
> The hangs were indirectly caused by aio_poll() and in particular by
> flash memory updates's call to blk_write(), which invokes aio_poll().
> Fun stuff: they had an extremely short race window, so much that
> adding all kind of tracing to either the kernel or QEMU made it
> go away (a single printf made it half as reproducible).
> [...]

Awesome commit message. :-)

Took a while to read, think about and understand everything, but you did
a great job on describing the bug. In the end, all the information I
needed was there.

I'm not sure if you want to trust my review, considering that I failed
the first time around, but for the most part, this looks good to me.

> diff --git a/aio-win32.c b/aio-win32.c
> index 233d8f5..ae7c6cf 100644
> --- a/aio-win32.c
> +++ b/aio-win32.c
> @@ -318,11 +313,11 @@ bool aio_poll(AioContext *ctx, bool blocking)
>      first = true;
>  
>      /* wait until next event */
> -    while (count > 0) {
> +    do {
>          HANDLE event;
>          int ret;
>  
> -        timeout = blocking
> +        timeout = blocking && !have_select_revents
>              ? qemu_timeout_ns_to_ms(aio_compute_timeout(ctx)) : 0;
>          if (timeout) {
>              aio_context_release(ctx);

Here I'm not sure why you switched to a do..while loop? It's not obvious
to me how the change from aio_set_dispatching() to ctx->notify_me is
related to this.

Does it mean that we can call WaitForMultipleObjects() with nCount == 0?
This seems to be forbidden.

> diff --git a/tests/test-aio.c b/tests/test-aio.c
> index a7cb5c9..217e337 100644
> --- a/tests/test-aio.c
> +++ b/tests/test-aio.c
> @@ -97,14 +97,6 @@ static void event_ready_cb(EventNotifier *e)
>  
>  /* Tests using aio_*.  */
>  
> -static void test_notify(void)
> -{
> -    g_assert(!aio_poll(ctx, false));
> -    aio_notify(ctx);
> -    g_assert(!aio_poll(ctx, true));
> -    g_assert(!aio_poll(ctx, false));
> -}
> -
>  typedef struct {
>      QemuMutex start_lock;
>      bool thread_acquired;

Okay. We don't actually notify any more, so this test hangs now.
Removing seems fine.

> @@ -331,7 +323,7 @@ static void test_wait_event_notifier(void)
>      EventNotifierTestData data = { .n = 0, .active = 1 };
>      event_notifier_init(&data.e, false);
>      aio_set_event_notifier(ctx, &data.e, event_ready_cb);
> -    g_assert(!aio_poll(ctx, false));
> +    while (aio_poll(ctx, false));
>      g_assert_cmpint(data.n, ==, 0);
>      g_assert_cmpint(data.active, ==, 1);
>  
> @@ -356,7 +348,7 @@ static void test_flush_event_notifier(void)
>      EventNotifierTestData data = { .n = 0, .active = 10, .auto_set = true };
>      event_notifier_init(&data.e, false);
>      aio_set_event_notifier(ctx, &data.e, event_ready_cb);
> -    g_assert(!aio_poll(ctx, false));
> +    while (aio_poll(ctx, false));
>      g_assert_cmpint(data.n, ==, 0);
>      g_assert_cmpint(data.active, ==, 10);

These two hunks I don't understand. Where would an event come from that
needs to be handled here?

With these hunks reverted, the test still passes for me.

> @@ -494,14 +486,6 @@ static void test_timer_schedule(void)
>   *   works well, and that's what I am using.
>   */
>  
> -static void test_source_notify(void)
> -{
> -    while (g_main_context_iteration(NULL, false));
> -    aio_notify(ctx);
> -    g_assert(g_main_context_iteration(NULL, true));
> -    g_assert(!g_main_context_iteration(NULL, false));
> -}

Same as above, good.

>  static void test_source_flush(void)
>  {
>      g_assert(!g_main_context_iteration(NULL, false));
> @@ -669,7 +653,7 @@ static void test_source_wait_event_notifier(void)
>      EventNotifierTestData data = { .n = 0, .active = 1 };
>      event_notifier_init(&data.e, false);
>      aio_set_event_notifier(ctx, &data.e, event_ready_cb);
> -    g_assert(g_main_context_iteration(NULL, false));
> +    while (g_main_context_iteration(NULL, false));
>      g_assert_cmpint(data.n, ==, 0);
>      g_assert_cmpint(data.active, ==, 1);
>  
> @@ -694,7 +678,7 @@ static void test_source_flush_event_notifier(void)
>      EventNotifierTestData data = { .n = 0, .active = 10, .auto_set = true };
>      event_notifier_init(&data.e, false);
>      aio_set_event_notifier(ctx, &data.e, event_ready_cb);
> -    g_assert(g_main_context_iteration(NULL, false));
> +    while (g_main_context_iteration(NULL, false));
>      g_assert_cmpint(data.n, ==, 0);
>      g_assert_cmpint(data.active, ==, 10);

Okay, this one is confusing. Took me a while to realise that aio_poll()
returns false when the only event is an aio_notify(), whereas
g_main_context_iteration() returns true in the same case.

With this information, I understand that what has changed is that the
return value of g_main_context_iteration() changes from true before this
patch (had the aio_notify() from aio_set_fd_handler() pending) to false
after the patch (aio_notify() doesn't inject an event any more).

This should mean that like above we can assert that the first iteration
returns false, i.e. reverse the assertion (and indeed, with this
change the test still passes for me).

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]