qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RFC 4/4] aio-posix: Use epoll in aio_poll


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH RFC 4/4] aio-posix: Use epoll in aio_poll
Date: Tue, 7 Jul 2015 16:08:50 +0100
User-agent: Mutt/1.5.23 (2014-03-12)

On Tue, Jun 30, 2015 at 09:19:45PM +0800, Fam Zheng wrote:
> =====================================================================
>   # of scsi-disks  |        master           |       epoll
>                    |   rd     wr    randrw   |   rd    wr    randrw
> ---------------------------------------------------------------------
>         1          |   103    96     49      |   105   99     49
>         4          |   92     96     48      |   103   98     49
>         8          |   96     94     46      |   101   97     50
>         16         |   91     91     45      |   101   95     48
>         32         |   84     83     40      |   95    95     48
>         64         |   75     73     35      |   91    90     44
>         128        |   54     53     26      |   79    80     39
>         256        |   41     39     19      |   63    62     30
> =====================================================================

Nice results!

> @@ -44,6 +47,12 @@ static AioHandler *find_aio_handler(AioContext *ctx, int 
> fd)
>  
>  void aio_context_setup(AioContext *ctx, Error **errp)
>  {
> +#ifdef CONFIG_EPOLL
> +    ctx->epollfd = epoll_create1(EPOLL_CLOEXEC);
> +    if (ctx->epollfd < 0) {
> +        error_setg(errp, "Failed to create epoll fd: %s", strerror(errno));

Slightly more concise:
error_setg_errno(errp, errno, "Failed to create epoll fd")

> -/* These thread-local variables are used only in a small part of aio_poll
> +#ifdef CONFIG_EPOLL
> +QEMU_BUILD_BUG_ON((int)G_IO_IN != EPOLLIN);
> +QEMU_BUILD_BUG_ON((int)G_IO_OUT != EPOLLOUT);
> +QEMU_BUILD_BUG_ON((int)G_IO_PRI != EPOLLPRI);
> +QEMU_BUILD_BUG_ON((int)G_IO_ERR != EPOLLERR);
> +QEMU_BUILD_BUG_ON((int)G_IO_HUP != EPOLLHUP);

I guess this assumption is okay but maybe the compiler optimizes:

  event.events = (node->pfd.events & G_IO_IN ? EPOLLIN : 0) |
                 (node->pfd.events & G_IO_OUT ? EPOLLOUT : 0) |
                 (node->pfd.events & G_IO_PRI ? EPOLLPRI : 0) |
                 (node->pfd.events & G_IO_ERR ? EPOLLERR : 0) |
                 (node->pfd.events & G_IO_HUP ? EPOLLHUP : 0);

into:

  events.events = node->pfd.events & (EPOLLIN | EPOLLOUT | EPOLLPRI |
                                      EPOLLERR | EPOLLHUP);

which is just an AND instruction so it's effectively free and doesn't
assume that these constants have the same values.

> +
> +#define EPOLL_BATCH 128
> +static bool aio_poll_epoll(AioContext *ctx, bool blocking)
> +{
> +    AioHandler *node;
> +    bool was_dispatching;
> +    int i, ret;
> +    bool progress;
> +    int64_t timeout;
> +    struct epoll_event events[EPOLL_BATCH];
> +
> +    aio_context_acquire(ctx);
> +    was_dispatching = ctx->dispatching;
> +    progress = false;
> +
> +    /* aio_notify can avoid the expensive event_notifier_set if
> +     * everything (file descriptors, bottom halves, timers) will
> +     * be re-evaluated before the next blocking poll().  This is
> +     * already true when aio_poll is called with blocking == false;
> +     * if blocking == true, it is only true after poll() returns.
> +     *
> +     * If we're in a nested event loop, ctx->dispatching might be true.
> +     * In that case we can restore it just before returning, but we
> +     * have to clear it now.
> +     */
> +    aio_set_dispatching(ctx, !blocking);
> +
> +    ctx->walking_handlers++;
> +
> +    timeout = blocking ? aio_compute_timeout(ctx) : 0;
> +
> +    if (timeout > 0) {
> +        timeout = DIV_ROUND_UP(timeout, 1000000);
> +    }

I think you already posted the timerfd code in an earlier series.  Why
degrade to millisecond precision?  It needs to be fixed up anyway if the
main loop uses aio_poll() in the future.

Attachment: pgpBv8N6Lf2EJ.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]