qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 3/3] virtio-blk: introduce multiread


From: Kevin Wolf
Subject: Re: [Qemu-devel] [RFC PATCH 3/3] virtio-blk: introduce multiread
Date: Thu, 4 Dec 2014 15:12:01 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

Am 02.12.2014 um 15:33 hat Peter Lieven geschrieben:
> this patch finally introduce multiread support to virtio-blk while
> multiwrite support was there for a long time read support was missing.
> 
> To achieve this the patch does serveral things which might need futher
> explaination:
> 
>  - the whole merge and multireq logic is moved from block.c into
>    virtio-blk. This is move is a preparation for directly creating a
>    coroutine out of virtio-blk.
> 
>  - requests are only merged if they are strictly sequential and no
>    longer sorted. This simplification decreases overhead and reduces
>    latency. It will also merge some requests which were unmergable before.
> 
>    The old algorithm took up to 32 requests sorted them and tried to merge
>    them. The outcome was anything between 1 and 32 requests. In case of
>    32 requests there were 31 requests unnecessarily delayed.
> 
>    On the other hand lets imagine e.g. 16 unmergeable requests followed
>    by 32 mergable requests. The latter 32 requests would have been split
>    into two 16 byte requests.
> 
>    Last the simplified logic allows for a fast path if we have only a
>    single request in the multirequest. In this case the request is sent as
>    ordinary request without mulltireq callbacks.
> 
> As a first benchmark I installed Ubuntu 14.04.1 on a ramdisk. The number of
> merged requests is in the same order while the latency is slightly decreased.
> One should not stick to much to the numbers because the number of wr_requests
> are highly fluctuant. I hope the numbers show that this patch is at least
> not causing to big harm:
> 
> Cmdline:
> qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom 
> ubuntu-14.04.1-server-amd64.iso \
>  -drive if=virtio,file=/tmp/ubuntu.raw -monitor stdio
> 
> Before:
> virtio0: rd_bytes=150979072 wr_bytes=2591138816 rd_operations=18475 
> wr_operations=69216
>          flush_operations=15343 wr_total_time_ns=26969283701 
> rd_total_time_ns=1018449432
>          flush_total_time_ns=562596819 rd_merged=0 wr_merged=10453
> 
> After:
> virtio0: rd_bytes=148112896 wr_bytes=2845834240 rd_operations=17535 
> wr_operations=72197
>          flush_operations=15760 wr_total_time_ns=26104971623 
> rd_total_time_ns=1012926283
>          flush_total_time_ns=564414752 rd_merged=517 wr_merged=9859
> 
> Some first numbers of improved read performance while booting:
> 
> The Ubuntu 14.04.1 vServer from above:
> virtio0: rd_bytes=86158336 wr_bytes=688128 rd_operations=5851 wr_operations=70
>          flush_operations=4 wr_total_time_ns=10886705 
> rd_total_time_ns=416688595
>          flush_total_time_ns=288776 rd_merged=1297 wr_merged=2
> 
> Windows 2012R2:
> virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 
> wr_operations=360
>          flush_operations=68 wr_total_time_ns=34344992718 
> rd_total_time_ns=134386844669
>          flush_total_time_ns=18115517 rd_merged=641 wr_merged=216
> 
> Signed-off-by: Peter Lieven <address@hidden>
> ---
>  hw/block/dataplane/virtio-blk.c |   10 +-
>  hw/block/virtio-blk.c           |  222 
> ++++++++++++++++++++++++---------------
>  include/hw/virtio/virtio-blk.h  |   23 ++--
>  3 files changed, 156 insertions(+), 99 deletions(-)
> 
> diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
> index 1222a37..aa4ad91 100644
> --- a/hw/block/dataplane/virtio-blk.c
> +++ b/hw/block/dataplane/virtio-blk.c
> @@ -96,9 +96,8 @@ static void handle_notify(EventNotifier *e)
>      event_notifier_test_and_clear(&s->host_notifier);
>      blk_io_plug(s->conf->conf.blk);
>      for (;;) {
> -        MultiReqBuffer mrb = {
> -            .num_writes = 0,
> -        };
> +        MultiReqBuffer mrb_rd = {};
> +        MultiReqBuffer mrb_wr = {.is_write = 1};
>          int ret;
>  
>          /* Disable guest->host notifies to avoid unnecessary vmexits */
> @@ -117,10 +116,11 @@ static void handle_notify(EventNotifier *e)
>                                                          req->elem.in_num,
>                                                          req->elem.index);
>  
> -            virtio_blk_handle_request(req, &mrb);
> +            virtio_blk_handle_request(req, &mrb_wr, &mrb_rd);
>          }
>  
> -        virtio_submit_multiwrite(s->conf->conf.blk, &mrb);
> +        virtio_submit_multireq(s->conf->conf.blk, &mrb_wr);
> +        virtio_submit_multireq(s->conf->conf.blk, &mrb_rd);
>  
>          if (likely(ret == -EAGAIN)) { /* vring emptied */
>              /* Re-enable guest->host notifies and stop processing the vring.
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index 490f961..f522bfc 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -22,12 +22,15 @@
>  #include "dataplane/virtio-blk.h"
>  #include "migration/migration.h"
>  #include "block/scsi.h"
> +#include "block/block_int.h"

No. :-)

We could either expose the information that you need through
BlockBackend, or wait until your automatic request splitting logic is
implemented in block.c and the information isn't needed here any more.

>  #ifdef __linux__
>  # include <scsi/sg.h>
>  #endif
>  #include "hw/virtio/virtio-bus.h"
>  #include "hw/virtio/virtio-access.h"
>  
> +/* #define DEBUG_MULTIREQ */
> +
>  VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s)
>  {
>      VirtIOBlockReq *req = g_slice_new(VirtIOBlockReq);
> @@ -88,6 +91,11 @@ static void virtio_blk_rw_complete(void *opaque, int ret)
>  
>      trace_virtio_blk_rw_complete(req, ret);
>  
> +#ifdef DEBUG_MULTIREQ
> +    printf("virtio_blk_rw_complete p %p ret %d\n",
> +           req, ret);
> +#endif

In the final version, please use tracepoints instead of printfs.

>      if (ret) {
>          int p = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type);
>          bool is_read = !(p & VIRTIO_BLK_T_OUT);
> @@ -257,24 +265,63 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq *req)
>      virtio_blk_free_request(req);
>  }
>  
> -void virtio_submit_multiwrite(BlockBackend *blk, MultiReqBuffer *mrb)
> +static void virtio_multireq_cb(void *opaque, int ret)
> +{
> +    MultiReqBuffer *mrb = opaque;
> +    int i;
> +#ifdef DEBUG_MULTIREQ
> +    printf("virtio_multireq_cb: p %p sector_num %ld nb_sectors %d is_write 
> %d num_reqs %d\n",
> +           mrb, mrb->sector_num, mrb->nb_sectors, mrb->is_write, 
> mrb->num_reqs);
> +#endif
> +    for (i = 0; i < mrb->num_reqs; i++) {
> +        virtio_blk_rw_complete(mrb->reqs[i], ret);
> +    }
> +
> +    qemu_iovec_destroy(&mrb->qiov);
> +    g_free(mrb);
> +}
> +
> +void virtio_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb0)
>  {
> -    int i, ret;
> +    MultiReqBuffer *mrb = NULL;
>  
> -    if (!mrb->num_writes) {
> +    if (!mrb0->num_reqs) {
>          return;
>      }
>  
> -    ret = blk_aio_multiwrite(blk, mrb->blkreq, mrb->num_writes);
> -    if (ret != 0) {
> -        for (i = 0; i < mrb->num_writes; i++) {
> -            if (mrb->blkreq[i].error) {
> -                virtio_blk_rw_complete(mrb->blkreq[i].opaque, -EIO);
> -            }
> +    if (mrb0->num_reqs == 1) {
> +        if (mrb0->is_write) {
> +            blk_aio_writev(blk, mrb0->sector_num, &mrb0->reqs[0]->qiov, 
> mrb0->nb_sectors,
> +                           virtio_blk_rw_complete, mrb0->reqs[0]);
> +        } else {
> +            blk_aio_readv(blk, mrb0->sector_num, &mrb0->reqs[0]->qiov, 
> mrb0->nb_sectors,
> +                          virtio_blk_rw_complete, mrb0->reqs[0]);
>          }
> +        qemu_iovec_destroy(&mrb0->qiov);
> +        mrb0->num_reqs = 0;
> +        return;
> +    }
> +
> +    mrb = g_malloc(sizeof(MultiReqBuffer));
> +    memcpy(mrb, mrb0, sizeof(MultiReqBuffer));
> +    mrb0->num_reqs = 0;

We probably want to avoid allocations as much as we can. If at all
possible, we don't want to memcpy() either.

Most parts of MultiReqBuffer don't seem to be actually needed during the
request and in the callback any more. Essentially, all that is needed is
the qiov and a way to find all the requests that belong to the same
MultiReqBuffer, so that they can all be completed.

We could create a linked list by having a VirtIOBlockReq *next in the
request struct. For the qiov we can probably just use the field in the
first request (i.e. we extend the qiov of the first request).

With these requirements removed, MultiReqBuffer can live on the stack
and doesn't need to be copied to the heap.


With the additional patch that you send me, the next few lines read:

+    qemu_iovec_init(&mrb->qiov, mrb->num_reqs);
+
+    for (i = 0; i < mrb->num_reqs; i++) {
+        qemu_iovec_concat(&mrb->qiov, &mrb->reqs[i]->qiov, 0, 
mrb->reqs[i]->qiov.size);
+    }
+    assert(mrb->nb_sectors == mrb->qiov.size / BDRV_SECTOR_SIZE);

The long line is more than 80 characters, but probably more importantly,
mrb->num_reqs is a really bad estimation for niov. If the guest uses any
vectored I/O, we're sure to get reallocations. What you really want to
use here is the old mrb->qiov.niov (which is an abuse to store in the
qiov, see below).

(I also wonder whether it would make sense to keep qiovs allocated when
putting VirtIOBlockReqs into the pool, but that's not for this series.)

> +
> +#ifdef DEBUG_MULTIREQ
> +    printf("virtio_submit_multireq: p %p sector_num %ld nb_sectors %d 
> is_write %d num_reqs %d\n",
> +           mrb, mrb->sector_num, mrb->nb_sectors, mrb->is_write, 
> mrb->num_reqs);
> +#endif
> +
> +    if (mrb->is_write) {
> +        blk_aio_writev(blk, mrb->sector_num, &mrb->qiov, mrb->nb_sectors,
> +                       virtio_multireq_cb, mrb);
> +    } else {
> +        blk_aio_readv(blk, mrb->sector_num, &mrb->qiov, mrb->nb_sectors,
> +                      virtio_multireq_cb, mrb);
>      }
>  
> -    mrb->num_writes = 0;
> +    block_acct_merge_done(blk_get_stats(blk),
> +                          mrb->is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ,
> +                          mrb->num_reqs - 1);
>  }
>  
>  static void virtio_blk_handle_flush(VirtIOBlockReq *req, MultiReqBuffer *mrb)
> @@ -283,9 +330,9 @@ static void virtio_blk_handle_flush(VirtIOBlockReq *req, 
> MultiReqBuffer *mrb)
>                       BLOCK_ACCT_FLUSH);
>  
>      /*
> -     * Make sure all outstanding writes are posted to the backing device.
> +     * Make sure all outstanding requests are posted to the backing device.
>       */
> -    virtio_submit_multiwrite(req->dev->blk, mrb);
> +    virtio_submit_multireq(req->dev->blk, mrb);
>      blk_aio_flush(req->dev->blk, virtio_blk_flush_complete, req);
>  }
>  
> @@ -308,61 +355,8 @@ static bool virtio_blk_sect_range_ok(VirtIOBlock *dev,
>      return true;
>  }
>  
> -static void virtio_blk_handle_write(VirtIOBlockReq *req, MultiReqBuffer *mrb)
> -{
> -    BlockRequest *blkreq;
> -    uint64_t sector;
> -
> -    sector = virtio_ldq_p(VIRTIO_DEVICE(req->dev), &req->out.sector);
> -
> -    trace_virtio_blk_handle_write(req, sector, req->qiov.size / 512);
> -
> -    if (!virtio_blk_sect_range_ok(req->dev, sector, req->qiov.size)) {
> -        virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
> -        virtio_blk_free_request(req);
> -        return;
> -    }
> -
> -    block_acct_start(blk_get_stats(req->dev->blk), &req->acct, 
> req->qiov.size,
> -                     BLOCK_ACCT_WRITE);
> -
> -    if (mrb->num_writes == VIRTIO_BLK_MAX_MERGE_REQS) {
> -        virtio_submit_multiwrite(req->dev->blk, mrb);
> -    }
> -
> -    blkreq = &mrb->blkreq[mrb->num_writes];
> -    blkreq->sector = sector;
> -    blkreq->nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
> -    blkreq->qiov = &req->qiov;
> -    blkreq->cb = virtio_blk_rw_complete;
> -    blkreq->opaque = req;
> -    blkreq->error = 0;
> -
> -    mrb->num_writes++;
> -}
> -
> -static void virtio_blk_handle_read(VirtIOBlockReq *req)
> -{
> -    uint64_t sector;
> -
> -    sector = virtio_ldq_p(VIRTIO_DEVICE(req->dev), &req->out.sector);
> -
> -    trace_virtio_blk_handle_read(req, sector, req->qiov.size / 512);
> -
> -    if (!virtio_blk_sect_range_ok(req->dev, sector, req->qiov.size)) {
> -        virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
> -        virtio_blk_free_request(req);
> -        return;
> -    }
> -
> -    block_acct_start(blk_get_stats(req->dev->blk), &req->acct, 
> req->qiov.size,
> -                     BLOCK_ACCT_READ);
> -    blk_aio_readv(req->dev->blk, sector, &req->qiov,
> -                  req->qiov.size / BDRV_SECTOR_SIZE,
> -                  virtio_blk_rw_complete, req);
> -}

Handling reads and writes in a common place makes sense, but the code is
long enough that I would still keep it in a separate function, like
virtio_blk_handle_rw().

> -void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
> +void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb_wr,
> +                               MultiReqBuffer *mrb_rd)
>  {
>      uint32_t type;
>      struct iovec *in_iov = req->elem.in_sg;
> @@ -397,7 +391,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, 
> MultiReqBuffer *mrb)
>      type = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type);
>  
>      if (type & VIRTIO_BLK_T_FLUSH) {
> -        virtio_blk_handle_flush(req, mrb);
> +        virtio_blk_handle_flush(req, mrb_wr);
>      } else if (type & VIRTIO_BLK_T_SCSI_CMD) {
>          virtio_blk_handle_scsi(req);
>      } else if (type & VIRTIO_BLK_T_GET_ID) {
> @@ -414,13 +408,71 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, 
> MultiReqBuffer *mrb)
>          iov_from_buf(in_iov, in_num, 0, serial, size);
>          virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
>          virtio_blk_free_request(req);
> -    } else if (type & VIRTIO_BLK_T_OUT) {
> -        qemu_iovec_init_external(&req->qiov, iov, out_num);
> -        virtio_blk_handle_write(req, mrb);
> -    } else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) {
> -        /* VIRTIO_BLK_T_IN is 0, so we can't just & it. */
> -        qemu_iovec_init_external(&req->qiov, in_iov, in_num);
> -        virtio_blk_handle_read(req);
> +    } else if (type & VIRTIO_BLK_T_OUT || type == VIRTIO_BLK_T_IN || type == 
> VIRTIO_BLK_T_BARRIER) {
> +        bool is_write = type & VIRTIO_BLK_T_OUT;
> +        MultiReqBuffer *mrb = is_write ? mrb_wr : mrb_rd;
> +        int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev), 
> &req->out.sector);
> +        BlockDriverState *bs = blk_bs(req->dev->blk);
> +        int nb_sectors = 0;
> +        int merge = 1;
> +
> +        if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) 
> {
> +            virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
> +            virtio_blk_free_request(req);
> +            return;
> +        }
> +
> +        if (is_write) {
> +            qemu_iovec_init_external(&req->qiov, iov, out_num);
> +            trace_virtio_blk_handle_write(req, sector_num, req->qiov.size / 
> BDRV_SECTOR_SIZE);
> +        } else {
> +            qemu_iovec_init_external(&req->qiov, in_iov, in_num);
> +            trace_virtio_blk_handle_read(req, sector_num, req->qiov.size / 
> BDRV_SECTOR_SIZE);
> +        }
> +
> +        nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
> +#ifdef DEBUG_MULTIREQ
> +        printf("virtio_blk_handle_request p %p sector_num %ld nb_sectors %d 
> is_write %d\n",
> +               req, sector_num, nb_sectors, is_write);
> +#endif
> +
> +        block_acct_start(blk_get_stats(req->dev->blk), &req->acct, 
> req->qiov.size,
> +                         is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ);
> +
> +        /* merge would exceed maximum number of requests or IOVs */
> +        if (mrb->num_reqs == MAX_MERGE_REQS ||
> +            mrb->qiov.niov + req->qiov.niov + 1 > IOV_MAX) {
> +            merge = 0;
> +        }

Again with your latest patch applied on top:

You're abusing mrb->qiov.niov while gathering requests. mrb->qiov is
never initialised and you don't actually add iovs anywhere, but you just
increase qiov.niov without the associated data.

Why don't you make it in simple int instead of QEMUIOVector?

> +
> +        /* merge would exceed maximum transfer length of backend device */
> +        if (bs->bl.max_transfer_length &&
> +            mrb->nb_sectors + nb_sectors > bs->bl.max_transfer_length) {
> +            merge = 0;
> +        }
> +
> +        /* requests are not sequential */
> +        if (mrb->num_reqs && mrb->sector_num + mrb->nb_sectors != 
> sector_num) {
> +            merge = 0;
> +        }
> +
> +        if (!merge) {
> +            virtio_submit_multireq(req->dev->blk, mrb);
> +        }
> +
> +        if (mrb->num_reqs == 0) {
> +            qemu_iovec_init(&mrb->qiov, MAX_MERGE_REQS);
> +            mrb->sector_num = sector_num;
> +            mrb->nb_sectors = 0;
> +        }
> +
> +        qemu_iovec_concat(&mrb->qiov, &req->qiov, 0, req->qiov.size);
> +        mrb->nb_sectors += req->qiov.size / BDRV_SECTOR_SIZE;
> +
> +        assert(mrb->nb_sectors == mrb->qiov.size / BDRV_SECTOR_SIZE);
> +
> +        mrb->reqs[mrb->num_reqs] = req;
> +        mrb->num_reqs++;
>      } else {
>          virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
>          virtio_blk_free_request(req);

Otherwise it looks as if it could work.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]