qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler


From: Hanna Czenczek
Subject: Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler
Date: Tue, 2 Jan 2024 16:24:11 +0100
User-agent: Mozilla Thunderbird

On 13.12.23 22:15, Stefan Hajnoczi wrote:
Hanna and Fiona encountered a bug in aio_set_fd_handler(): there is no matching
io_poll_end() call upon removing an AioHandler when io_poll_begin() was
previously called. The missing io_poll_end() call leaves virtqueue
notifications disabled and the virtqueue's ioeventfd will never become
readable anymore.

The details of how virtio-scsi devices using IOThreads can hang after
hotplug/unplug are covered here:
https://issues.redhat.com/browse/RHEL-3934

Hanna is currently away over the December holidays. I'm sending these RFC
patches in the meantime. They demonstrate running aio_set_fd_handler() in the
AioContext home thread and adding the missing io_poll_end() call.

I agree with Paolo that if the handlers are removed, the caller probably isn’t interested in notifications: In our specific case, we remove the handlers because the device is to be drained, so we won’t poll the virtqueue anyway.  Therefore, if io_poll_end() is to be called to complete the start-end pair, it shouldn’t be done when the handlers are removed, but instead when they are reinstated.

I believe that’s quite infeasible to do in generic code: We’d have to basically remember that we haven’t called a specific io_poll_end callback yet, and so once it is reinstated while we’re not in a polling phase, we would need to call it then.  That in itself is already hairy, but in addition, the callback consists of a function pointer and an opaque pointer, and it seems impossible to reliably establish identity between two opaque pointers to know whether a newly instated io_poll_end callback is the same one as one that had been removed before (pointer identity may work, but also may not).

That’s why I think the responsibility should fall on the caller.  I believe both virtio_queue_aio_attach_host_notifier*() functions should enable vq notifications before instating the handler – if we’re in polling mode, io_poll_start() will then immediately be called and disable vq notifications again.  Having them enabled briefly shouldn’t hurt anything but performance (which I don’t think is terrible in the drain case).  For completeness’ sake, we may even have virtio_queue_aio_detach_host_notifier() disable vq notifications, because otherwise it’s unknown whether notifications are enabled or disabled after removing the notifiers.  That isn’t terrible, but I think (A) if any of the two, we want them to be disabled, because we won’t check for requests anyway, and (B) this gives us a clearer state machine, where removing the notifiers will always leave vq notifications disabled, so when they are reinstated (i.e. when calling virtio_queue_aio_attach_host_notifier*()), it’s clear that we must poll once to check for new requests.

I’ve attached the preliminary patch that I didn’t get to send (or test much) last year.  Not sure if it has the same CPU-usage-spike issue Fiona was seeing, the only functional difference is that I notify the vq after attaching the notifiers instead of before.

Hanna

Attachment: 0001-Keep-notifications-disabled-during-drain.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]