qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 1/3] aio-posix: add aio_set_poll_handler()


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [RFC 1/3] aio-posix: add aio_set_poll_handler()
Date: Thu, 10 Nov 2016 10:17:35 +0000
User-agent: Mutt/1.7.1 (2016-10-04)

On Wed, Nov 09, 2016 at 06:30:11PM +0100, Paolo Bonzini wrote:

Thanks for the feedback.  I hope that Karl will be able to find a
QEMU_AIO_POLL_MAX_NS setting that improves the benchmark.  At that point
I'll send a new version of this series so we can iron out the details.

> > +static bool run_poll_handlers(AioContext *ctx)
> > +{
> > +    int64_t start_time;
> > +    unsigned int loop_count = 0;
> > +    bool fired = false;
> > +
> > +    /* Is there any polling to be done? */
> 
> I think the question is not "is there any polling to be done" but rather
> "is there anything that requires looking at a file descriptor".  If you
> have e.g. an NBD device on the AioContext you cannot poll.  On the other
> hand if all you have is bottom halves (which you can poll with
> ctx->notified), AIO and virtio ioeventfds, you can poll.

This is a good point.  Polling should only be done if all resources in
the AioContext benefit from polling - otherwise it adds latency to
resources that don't support polling.

Another thing: only poll if there is work to be done.  Linux AIO must
only poll the ring when there are >0 requests outstanding.  Currently it
always polls (doh!).

> In particular, testing for bottom halves is necessary to avoid incurring
> extra latency on flushes, which use the thread pool.

The current code uses a half-solution: it uses aio_compute_timeout() to
see if any existing BHs are ready to execute *before* beginning to poll.

Really we should poll BHs since they can be scheduled during the polling
loop.

> Perhaps the poll handler could be a parameter to aio_set_event_notifier?
>  run_poll_handlers can just set revents (to G_IO_IN for example) if the
> polling handler returns true, and return true as well.  aio_poll can
> then call aio_notify_accept and aio_dispatch, bypassing the poll system
> call altogether.

This is problematic.  The poll source != file descriptor so there is a
race condition:

1. Guest increments virtqueue avail.idx

2. QEMU poll notices avail.idx update and marks fd.revents readable.

3. QEMU dispatches fd handler:

void virtio_queue_host_notifier_read(EventNotifier *n)
{
    VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
    if (event_notifier_test_and_clear(n)) {
        virtio_queue_notify_vq(vq);
    }
}

4. Guest kicks virtqueue -> ioeventfd is signalled

Unfortunately polling is "too fast" and event_notifier_test_and_clear()
returns false; we won't process the virtqueue!

Pretending that polling is the same as fd monitoring only works when #4
happens before #3.  We have to solve this race condition.

The simplest solution is to get rid of the if statement (i.e. enable
spurious event processing).  Not sure if that has a drawback though.

Do you have a nicer solution in mind?

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]