qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread


From: Peter Lieven
Subject: Re: [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread
Date: Mon, 15 Dec 2014 17:02:45 +0100


> Am 15.12.2014 um 17:00 schrieb Kevin Wolf <address@hidden>:
> 
> Am 15.12.2014 um 16:52 hat Peter Lieven geschrieben:
>> On 15.12.2014 16:43, Peter Lieven wrote:
>>> On 15.12.2014 16:01, Kevin Wolf wrote:
>>>> Am 09.12.2014 um 17:26 hat Peter Lieven geschrieben:
>>>>> this patch finally introduces multiread support to virtio-blk. While
>>>>> multiwrite support was there for a long time, read support was missing.
>>>>> 
>>>>> To achieve this the patch does several things which might need further
>>>>> explanation:
>>>>> 
>>>>> - the whole merge and multireq logic is moved from block.c into
>>>>>   virtio-blk. This is move is a preparation for directly creating a
>>>>>   coroutine out of virtio-blk.
>>>>> 
>>>>> - requests are only merged if they are strictly sequential, and no
>>>>>   longer sorted. This simplification decreases overhead and reduces
>>>>>   latency. It will also merge some requests which were unmergable before.
>>>>> 
>>>>>   The old algorithm took up to 32 requests, sorted them and tried to merge
>>>>>   them. The outcome was anything between 1 and 32 requests. In case of
>>>>>   32 requests there were 31 requests unnecessarily delayed.
>>>>> 
>>>>>   On the other hand let's imagine e.g. 16 unmergeable requests followed
>>>>>   by 32 mergable requests. The latter 32 requests would have been split
>>>>>   into two 16 byte requests.
>>>>> 
>>>>>   Last the simplified logic allows for a fast path if we have only a
>>>>>   single request in the multirequest. In this case the request is sent as
>>>>>   ordinary request without multireq callbacks.
>>>>> 
>>>>> As a first benchmark I installed Ubuntu 14.04.1 on a local SSD. The 
>>>>> number of
>>>>> merged requests is in the same order while the write latency is obviously
>>>>> decreased by several percent.
>>>>> 
>>>>> cmdline:
>>>>> qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom 
>>>>> ubuntu-14.04.1-server-amd64.iso \
>>>>> -drive if=virtio,file=/dev/ssd/ubuntu1404,aio=native,cache=none -monitor 
>>>>> stdio
>>>>> 
>>>>> Before:
>>>>> virtio0:
>>>>> rd_bytes=151056896 wr_bytes=2683947008 rd_operations=18614 
>>>>> wr_operations=67979
>>>>> flush_operations=15335 wr_total_time_ns=540428034217 
>>>>> rd_total_time_ns=11110520068
>>>>> flush_total_time_ns=40673685006 rd_merged=0 wr_merged=15531
>>>>> 
>>>>> After:
>>>>> virtio0:
>>>>> rd_bytes=149487104 wr_bytes=2701344768 rd_operations=18148 
>>>>> wr_operations=68578
>>>>> flush_operations=15368 wr_total_time_ns=437030089565 
>>>>> rd_total_time_ns=9836288815
>>>>> flush_total_time_ns=40597981121 rd_merged=690 wr_merged=14615
>>>>> 
>>>>> Some first numbers of improved read performance while booting:
>>>>> 
>>>>> The Ubuntu 14.04.1 vServer from above:
>>>>> virtio0:
>>>>> rd_bytes=97545216 wr_bytes=119808 rd_operations=5071 wr_operations=26
>>>>> flush_operations=2 wr_total_time_ns=8847669 rd_total_time_ns=13952575478
>>>>> flush_total_time_ns=3075496 rd_merged=742 wr_merged=0
>>>>> 
>>>>> Windows 2012R2 (booted from iSCSI):
>>>>> virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 
>>>>> wr_operations=360
>>>>> flush_operations=68 wr_total_time_ns=34344992718 
>>>>> rd_total_time_ns=134386844669
>>>>> flush_total_time_ns=18115517 rd_merged=641 wr_merged=216
>>>>> 
>>>>> Signed-off-by: Peter Lieven <address@hidden>
>>>> Looks pretty good. The only thing I'm still unsure about are possible
>>>> integer overflows in the merging logic. Maybe you can have another look
>>>> there (ideally not only the places I commented on below, but the whole
>>>> function).
>>>> 
>>>>> @@ -414,14 +402,81 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, 
>>>>> MultiReqBuffer *mrb)
>>>>>         iov_from_buf(in_iov, in_num, 0, serial, size);
>>>>>         virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
>>>>>         virtio_blk_free_request(req);
>>>>> -    } else if (type & VIRTIO_BLK_T_OUT) {
>>>>> -        qemu_iovec_init_external(&req->qiov, iov, out_num);
>>>>> -        virtio_blk_handle_write(req, mrb);
>>>>> -    } else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) {
>>>>> -        /* VIRTIO_BLK_T_IN is 0, so we can't just & it. */
>>>>> -        qemu_iovec_init_external(&req->qiov, in_iov, in_num);
>>>>> -        virtio_blk_handle_read(req);
>>>>> -    } else {
>>>>> +        break;
>>>>> +    }
>>>>> +    case VIRTIO_BLK_T_IN:
>>>>> +    case VIRTIO_BLK_T_OUT:
>>>>> +    {
>>>>> +        bool is_write = type & VIRTIO_BLK_T_OUT;
>>>>> +        int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
>>>>> + &req->out.sector);
>>>>> +        int max_transfer_length = 
>>>>> blk_get_max_transfer_length(req->dev->blk);
>>>>> +        int nb_sectors = 0;
>>>>> +        bool merge = true;
>>>>> +
>>>>> +        if (!virtio_blk_sect_range_ok(req->dev, sector_num, 
>>>>> req->qiov.size)) {
>>>>> +            virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
>>>>> +            virtio_blk_free_request(req);
>>>>> +            return;
>>>>> +        }
>>>>> +
>>>>> +        if (is_write) {
>>>>> +            qemu_iovec_init_external(&req->qiov, iov, out_num);
>>>>> +            trace_virtio_blk_handle_write(req, sector_num,
>>>>> +                                          req->qiov.size / 
>>>>> BDRV_SECTOR_SIZE);
>>>>> +        } else {
>>>>> +            qemu_iovec_init_external(&req->qiov, in_iov, in_num);
>>>>> +            trace_virtio_blk_handle_read(req, sector_num,
>>>>> +                                         req->qiov.size / 
>>>>> BDRV_SECTOR_SIZE);
>>>>> +        }
>>>>> +
>>>>> +        nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
>>>> qiov.size is controlled by the guest, and nb_sectors is only an int. Are
>>>> you sure that this can't overflow?
>>> 
>>> In theory, yes. For this to happen in_iov or iov needs to contain
>>> 2TB of data on 32-bit systems. But theoretically there could
>>> also be already an overflow in qemu_iovec_init_external where
>>> multiple size_t are summed up in a size_t.
>>> 
>>> There has been no overflow checking in the merge routine in
>>> the past, but if you feel better, we could add sth like this:
>>> 
>>> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
>>> index cc0076a..e9236da 100644
>>> --- a/hw/block/virtio-blk.c
>>> +++ b/hw/block/virtio-blk.c
>>> @@ -410,8 +410,8 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, 
>>> MultiReqBuffer *mrb)
>>>        bool is_write = type & VIRTIO_BLK_T_OUT;
>>>        int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
>>> &req->out.sector);
>>> -        int max_transfer_length = 
>>> blk_get_max_transfer_length(req->dev->blk);
>>> -        int nb_sectors = 0;
>>> +        int64_t max_transfer_length = 
>>> blk_get_max_transfer_length(req->dev->blk);
>>> +        int64_t nb_sectors = 0;
>>>        bool merge = true;
>>> 
>>>        if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) 
>>> {
>>> @@ -431,6 +431,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, 
>>> MultiReqBuffer *mrb)
>>>        }
>>> 
>>>        nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
>>> +        max_transfer_length = MIN_NON_ZERO(max_transfer_length, INT_MAX);
>>> 
>>>        block_acct_start(blk_get_stats(req->dev->blk),
>>>                         &req->acct, req->qiov.size,
>>> @@ -443,8 +444,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, 
>>> MultiReqBuffer *mrb)
>>>        }
>>> 
>>>        /* merge would exceed maximum transfer length of backend device */
>>> -        if (max_transfer_length &&
>>> -            mrb->nb_sectors + nb_sectors > max_transfer_length) {
>>> +        if (nb_sectors + mrb->nb_sectors > max_transfer_length) {
>>>            merge = false;
>>>        }
>> 
>> May also this here:
>> 
>> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
>> index cc0076a..fa647b6 100644
>> --- a/hw/block/virtio-blk.c
>> +++ b/hw/block/virtio-blk.c
>> @@ -333,6 +333,9 @@ static bool virtio_blk_sect_range_ok(VirtIOBlock *dev,
>>     uint64_t nb_sectors = size >> BDRV_SECTOR_BITS;
>>     uint64_t total_sectors;
>> 
>> +    if (nb_sectors > INT_MAX) {
>> +        return false;
>> +    }
>>     if (sector & dev->sector_mask) {
>>         return false;
>>     }
>> 
>> 
>> Thats something that has not been checked for ages as well.
> 
> Adding checks can never hurt, so go for it. ;-)

i will add both checks  and Fams comment and send a v2 tomorrow.

Peter

> 
> Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]