qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread
Date: Mon, 15 Dec 2014 16:01:07 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

Am 09.12.2014 um 17:26 hat Peter Lieven geschrieben:
> this patch finally introduces multiread support to virtio-blk. While
> multiwrite support was there for a long time, read support was missing.
> 
> To achieve this the patch does several things which might need further
> explanation:
> 
>  - the whole merge and multireq logic is moved from block.c into
>    virtio-blk. This is move is a preparation for directly creating a
>    coroutine out of virtio-blk.
> 
>  - requests are only merged if they are strictly sequential, and no
>    longer sorted. This simplification decreases overhead and reduces
>    latency. It will also merge some requests which were unmergable before.
> 
>    The old algorithm took up to 32 requests, sorted them and tried to merge
>    them. The outcome was anything between 1 and 32 requests. In case of
>    32 requests there were 31 requests unnecessarily delayed.
> 
>    On the other hand let's imagine e.g. 16 unmergeable requests followed
>    by 32 mergable requests. The latter 32 requests would have been split
>    into two 16 byte requests.
> 
>    Last the simplified logic allows for a fast path if we have only a
>    single request in the multirequest. In this case the request is sent as
>    ordinary request without multireq callbacks.
> 
> As a first benchmark I installed Ubuntu 14.04.1 on a local SSD. The number of
> merged requests is in the same order while the write latency is obviously
> decreased by several percent.
> 
> cmdline:
> qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom 
> ubuntu-14.04.1-server-amd64.iso \
>  -drive if=virtio,file=/dev/ssd/ubuntu1404,aio=native,cache=none -monitor 
> stdio
> 
> Before:
> virtio0:
>  rd_bytes=151056896 wr_bytes=2683947008 rd_operations=18614 
> wr_operations=67979
>  flush_operations=15335 wr_total_time_ns=540428034217 
> rd_total_time_ns=11110520068
>  flush_total_time_ns=40673685006 rd_merged=0 wr_merged=15531
> 
> After:
> virtio0:
>  rd_bytes=149487104 wr_bytes=2701344768 rd_operations=18148 
> wr_operations=68578
>  flush_operations=15368 wr_total_time_ns=437030089565 
> rd_total_time_ns=9836288815
>  flush_total_time_ns=40597981121 rd_merged=690 wr_merged=14615
> 
> Some first numbers of improved read performance while booting:
> 
> The Ubuntu 14.04.1 vServer from above:
> virtio0:
>  rd_bytes=97545216 wr_bytes=119808 rd_operations=5071 wr_operations=26
>  flush_operations=2 wr_total_time_ns=8847669 rd_total_time_ns=13952575478
>  flush_total_time_ns=3075496 rd_merged=742 wr_merged=0
> 
> Windows 2012R2 (booted from iSCSI):
> virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 
> wr_operations=360
>  flush_operations=68 wr_total_time_ns=34344992718 
> rd_total_time_ns=134386844669
>  flush_total_time_ns=18115517 rd_merged=641 wr_merged=216
> 
> Signed-off-by: Peter Lieven <address@hidden>

Looks pretty good. The only thing I'm still unsure about are possible
integer overflows in the merging logic. Maybe you can have another look
there (ideally not only the places I commented on below, but the whole
function).

> @@ -414,14 +402,81 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, 
> MultiReqBuffer *mrb)
>          iov_from_buf(in_iov, in_num, 0, serial, size);
>          virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
>          virtio_blk_free_request(req);
> -    } else if (type & VIRTIO_BLK_T_OUT) {
> -        qemu_iovec_init_external(&req->qiov, iov, out_num);
> -        virtio_blk_handle_write(req, mrb);
> -    } else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) {
> -        /* VIRTIO_BLK_T_IN is 0, so we can't just & it. */
> -        qemu_iovec_init_external(&req->qiov, in_iov, in_num);
> -        virtio_blk_handle_read(req);
> -    } else {
> +        break;
> +    }
> +    case VIRTIO_BLK_T_IN:
> +    case VIRTIO_BLK_T_OUT:
> +    {
> +        bool is_write = type & VIRTIO_BLK_T_OUT;
> +        int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
> +                                          &req->out.sector);
> +        int max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
> +        int nb_sectors = 0;
> +        bool merge = true;
> +
> +        if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) 
> {
> +            virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
> +            virtio_blk_free_request(req);
> +            return;
> +        }
> +
> +        if (is_write) {
> +            qemu_iovec_init_external(&req->qiov, iov, out_num);
> +            trace_virtio_blk_handle_write(req, sector_num,
> +                                          req->qiov.size / BDRV_SECTOR_SIZE);
> +        } else {
> +            qemu_iovec_init_external(&req->qiov, in_iov, in_num);
> +            trace_virtio_blk_handle_read(req, sector_num,
> +                                         req->qiov.size / BDRV_SECTOR_SIZE);
> +        }
> +
> +        nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;

qiov.size is controlled by the guest, and nb_sectors is only an int. Are
you sure that this can't overflow?

> +        block_acct_start(blk_get_stats(req->dev->blk),
> +                         &req->acct, req->qiov.size,
> +                         is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ);
> +
> +        /* merge would exceed maximum number of requests or IOVs */
> +        if (mrb->num_reqs == MAX_MERGE_REQS ||
> +            mrb->niov + req->qiov.niov + 1 > IOV_MAX) {
> +            merge = false;
> +        }
> +
> +        /* merge would exceed maximum transfer length of backend device */
> +        if (max_transfer_length &&
> +            mrb->nb_sectors + nb_sectors > max_transfer_length) {
> +            merge = false;
> +        }
> +
> +        /* requests are not sequential */
> +        if (mrb->num_reqs && mrb->sector_num + mrb->nb_sectors != 
> sector_num) {
> +            merge = false;
> +        }
> +
> +        /* if we switch from read to write or vise versa we should submit
> +         * outstanding requests to avoid unnecessary and potential long 
> delays.
> +         * Furthermore we share the same struct for read and write merging so
> +         * submission is a must here. */
> +        if (is_write != mrb->is_write) {
> +            merge = false;
> +        }
> +
> +        if (!merge) {
> +            virtio_submit_multireq(req->dev->blk, mrb);
> +        }
> +
> +        if (mrb->num_reqs == 0) {
> +            mrb->sector_num = sector_num;
> +            mrb->is_write = is_write;
> +        }
> +
> +        mrb->nb_sectors += req->qiov.size / BDRV_SECTOR_SIZE;

This one could also be problematic with respect to overflows.

> +        mrb->reqs[mrb->num_reqs] = req;
> +        mrb->niov += req->qiov.niov;
> +        mrb->num_reqs++;
> +        break;
> +    }
> +    default:
>          virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
>          virtio_blk_free_request(req);
>      }

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]