qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v4 0/7] virtio-blk: multiqueue support


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH v4 0/7] virtio-blk: multiqueue support
Date: Tue, 21 Jun 2016 14:28:42 +0100

On Tue, Jun 21, 2016 at 1:25 PM, Christian Borntraeger
<address@hidden> wrote:
> On 06/21/2016 02:13 PM, Stefan Hajnoczi wrote:
>> v4:
>>  * Rebased onto qemu.git/master
>>  * Included latest performance results
>>
>> v3:
>>  * Drop Patch 1 to batch guest notify for non-dataplane
>>
>>    The Linux AIO completion BH and the virtio-blk batch notify BH changed 
>> order
>>    in the AioContext->first_bh list as a side-effect of moving the BH from
>>    hw/block/dataplane/virtio-blk.c to hw/block/virtio-blk.c.  This caused a
>>    serious performance regression for both dataplane and non-dataplane.
>>
>>    I've decided not to move the BH in this series and work on a separate
>>    solution for making batch notify generic.
>>
>>    The remaining patches have been reordered and cleaned up.
>>
>>  * See performance data below.
>>
>> v2:
>>  * Simplify s->rq live migration [Paolo]
>>  * Use more efficient bitmap ops for batch notification [Paolo]
>>  * Fix perf regression due to batch notify BH in wrong AioContext [Christian]
>>
>> The virtio_blk guest driver has supported multiple virtqueues since Linux 
>> 3.17.
>> This patch series adds multiple virtqueues to QEMU's virtio-blk emulated
>> device.
>>
>> Ming Lei sent patches previously but these were not merged.  This series
>> implements virtio-blk multiqueue for QEMU from scratch since the codebase has
>> changed.  Live migration support for s->rq was also missing from the previous
>> series and has been added.
>>
>> It's important to note that QEMU's block layer does not support multiqueue 
>> yet.
>> Therefore virtio-blk device processes all virtqueues in the same AioContext
>> (IOThread).  Further work is necessary to take advantage of multiqueue 
>> support
>> in QEMU's block layer once it becomes available.
>>
>> Performance results:
>>
>> Using virtio-blk-pci,num-queues=4 can produce a speed-up but -smp 4
>> introduces a lot of variance across runs.  No pinning was performed.
>>
>> RHEL 7.2 guest on RHEL 7.2 host with 1 vcpu and 1 GB RAM unless otherwise
>> noted.  The default configuration of the Linux null_blk driver is used as
>> /dev/vdb.
>>
>> $ cat files/fio.job
>> [global]
>> filename=/dev/vdb
>> ioengine=libaio
>> direct=1
>> runtime=60
>> ramp_time=5
>> gtod_reduce=1
>>
>> [job1]
>> numjobs=4
>> iodepth=16
>> rw=randread
>> bs=4K
>>
>> $ ./analyze.py runs/
>> Name                                   IOPS   Error
>> v4-smp-4-dataplane               13326598.0 ± 6.31%
>> v4-smp-4-dataplane-no-mq         11483568.0 ± 3.42%
>> v4-smp-4-no-dataplane            18108611.6 ± 1.53%
>> v4-smp-4-no-dataplane-no-mq      13951225.6 ± 7.81%
>
> This differs from the previous numbers. What is with
> and what is without patch? I am surprised to see dataplane
> to be slower than no-dataplane - this contradicts everything
> that I have seen in the past.

I reran without the patch, just qemu.git/master:
unpatched-7e13ea57f-smp-4-dataplane   11564565.4 ± 3.08%
unpatched-7e13ea57f-smp-4-no-dataplane   14262888.8 ± 2.82%

The host is Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz (16 logical
CPUs) with 32 GB RAM.

So the trend is the same without the patch.  Therefore I'm "satisfied"
that the mq vs no-mq numbers should an advantage for multiqueue.

They also show that this patch series does not introduce a regression:
v4-smp-4-dataplane-no-mq is close to
unpatched-7e13ea57f-smp-4-dataplane (11483568.0 ± 3.42% vs 11564565.4
± 3.08%) and v4-smp-4-no-dataplane-no-mq is close to
unpatched-7e13ea57f-smp-4-no-dataplane (13951225.6 ± 7.81% vs
14262888.8 ± 2.82%).

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]