[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler
From: |
Stefan Hajnoczi |
Subject: |
Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler |
Date: |
Thu, 14 Dec 2023 14:52:25 -0500 |
On Thu, Dec 14, 2023 at 12:10:32AM +0100, Paolo Bonzini wrote:
> On Wed, Dec 13, 2023 at 10:15 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > Alternatives welcome! (A cleaner version of this approach might be to forbid
> > cross-thread aio_set_fd_handler() calls and to refactor all
> > aio_set_fd_handler() callers so they come from the AioContext's home thread.
> > I'm starting to think that only the aio_notify() and aio_schedule_bh() APIs
> > should be thread-safe.)
>
> I think that's pretty hard because aio_set_fd_handler() is a pretty
> important part of the handoff from one AioContext to another and also
> of drained_begin()/end(), and both of these things run in the main
> thread.
>
> Regarding how to solve this issue, there is a lot of
> "underdocumenting" of the locking policy in aio-posix.c, and indeed it
> makes running aio_set_fd_handler() in the target AioContext tempting;
> but it is also scary to rely on the iothread being able to react
> quickly. I'm also worried that we're changing the logic just because
> we don't understand the old one, but then we add technical debt.
>
> So, as a first step, I would take inspiration from the block layer
> locking work, and add assertions to functions like poll_set_started()
> or find_aio_handler(). Is the list_lock elevated (nonzero)? Or locked?
> Are we in the iothread? And likewise, for each list, does insertion
> happen from the iothread or with the list_lock taken (and possibly
> elevated)? Does removal happen from the iothread or with list_lock
> zero+taken?
>
> After this step, we should have a clearer idea of the possible states
> of the node (based on the lists, the state is a subset of
> {poll_started, deleted, ready}) and draw a nice graph of the
> transitions. We should also understand if any calls to
> QLIST_IS_INSERTED() have correctness issues.
>
> Good news, I don't think any memory barriers are needed here. One
> thing that we already do correctly is that, once a node is deleted, we
> try to skip work; see for example poll_set_started(). This also
> provides a good place to do cleanup work for deleted nodes, including
> calling poll_end(): aio_free_deleted_handlers(), because it runs with
> list_lock zero and taken, just like the tail of
> aio_remove_fd_handler(). It's the safest possible place to do cleanup
> and to take a lock. Therefore we have:
>
> - a fast path in the iothread that runs without any concurrence with
> stuff happening in the main thread
>
> - a slow path in the iothread that runs with list_lock zero and taken.
> The slow path shares logic with the main thread, meaning that
> aio_free_deleted_handlers() and aio_remove_fd_handler() should share
> some functions called by both.
>
> If the code is organized this way, any wrong bits should jump out more
> easily. For example, these two lines in aio_remove_fd_handler() are
> clearly misplaced
>
> node->pfd.revents = 0;
> node->poll_ready = false;
>
> because they run in the main thread but they touch iothread data! They
> should be after qemu_lockcnt_count() is checked to be zero.
>
> Regarding the call to io_poll_ready(), I would hope that it is
> unnecessary; in other words, that after drained_end() the virtqueue
> notification would be raised. Yes, virtio_queue_set_notification is
> edge triggered rather than level triggered, so it would be necessary
> to add a check with virtio_queue_host_notifier_aio_poll() and
> virtio_queue_host_notifier_aio_poll_ready() in
> virtio_queue_aio_attach_host_notifier, but that does not seem too bad
> because virtio is the only user of the io_poll_begin and io_poll_end
> callbacks. It would have to be documented though.
I think Hanna had the same idea: document that ->io_poll_end() isn't
called by aio_set_fd_handler() and shift the responsibility onto the
caller to get back into a state where notifications are enabled before
they add the fd with aio_set_fd_handler() again.
In a little more detail, the caller needs to do the following before
adding the fd back with aio_set_fd_handler() again:
1. Call ->io_poll_end().
2. Poll one more time in case an event slipped in and write to the
eventfd so the fd is immediately readable or call ->io_poll_ready().
I think this is more or less what you described above.
I don't like pushing this responsibility onto the caller, but adding a
synchronization point in aio_set_fd_handler() is problematic, so let's
give it a try. I'll try that approach and send a v2.
Stefan
>
> Paolo
>
>
> Paolo
>
> >
> > Stefan Hajnoczi (3):
> > aio-posix: run aio_set_fd_handler() in target AioContext
> > aio: use counter instead of ctx->list_lock
> > aio-posix: call ->poll_end() when removing AioHandler
> >
> > include/block/aio.h | 22 ++---
> > util/aio-posix.c | 197 ++++++++++++++++++++++++++++++++------------
> > util/async.c | 2 -
> > util/fdmon-epoll.c | 6 +-
> > 4 files changed, 152 insertions(+), 75 deletions(-)
> >
> > --
> > 2.43.0
> >
>
signature.asc
Description: PGP signature
Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler, Stefan Hajnoczi, 2023/12/13
Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler, Paolo Bonzini, 2023/12/13
- Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler,
Stefan Hajnoczi <=
Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler, Fiona Ebner, 2023/12/14