[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v2 0/8] virtio-blk: multiqueue support
From: |
Roman Penyaev |
Subject: |
Re: [Qemu-devel] [PATCH v2 0/8] virtio-blk: multiqueue support |
Date: |
Sat, 4 Jun 2016 17:49:16 +0200 |
Hi,
On Sat, Jun 4, 2016 at 12:26 AM, Stefan Hajnoczi <address@hidden> wrote:
> On Thu, Jun 02, 2016 at 05:19:41PM -0700, Stefan Hajnoczi wrote:
>> On Mon, May 30, 2016 at 06:25:58PM -0700, Stefan Hajnoczi wrote:
>> > v2:
>> > * Simplify s->rq live migration [Paolo]
>> > * Use more efficient bitmap ops for batch notification [Paolo]
>> > * Fix perf regression due to batch notify BH in wrong AioContext
>> > [Christian]
>> >
>> > The virtio_blk guest driver has supported multiple virtqueues since Linux
>> > 3.17.
>> > This patch series adds multiple virtqueues to QEMU's virtio-blk emulated
>> > device.
>> >
>> > Ming Lei sent patches previously but these were not merged. This series
>> > implements virtio-blk multiqueue for QEMU from scratch since the codebase
>> > has
>> > changed. Live migration support for s->rq was also missing from the
>> > previous
>> > series and has been added.
>> >
>> > It's important to note that QEMU's block layer does not support multiqueue
>> > yet.
>> > Therefore virtio-blk device processes all virtqueues in the same AioContext
>> > (IOThread). Further work is necessary to take advantage of multiqueue
>> > support
>> > in QEMU's block layer once it becomes available.
>> >
>> > I will post performance results once they are ready.
>> >
>> > Stefan Hajnoczi (8):
>> > virtio-blk: use batch notify in non-dataplane case
>> > virtio-blk: tell dataplane which vq to notify
>> > virtio-blk: associate request with a virtqueue
>> > virtio-blk: add VirtIOBlockConf->num_queues
>> > virtio-blk: multiqueue batch notify
>> > virtio-blk: live migrateion s->rq with multiqueue
>> > virtio-blk: dataplane multiqueue support
>> > virtio-blk: add num-queues device property
>> >
>> > hw/block/dataplane/virtio-blk.c | 68 +++++++++++----------
>> > hw/block/dataplane/virtio-blk.h | 2 +-
>> > hw/block/virtio-blk.c | 129
>> > +++++++++++++++++++++++++++++++++++-----
>> > include/hw/virtio/virtio-blk.h | 8 ++-
>> > 4 files changed, 159 insertions(+), 48 deletions(-)
>>
>> There is a significant performance regression due to batch notify:
>>
>> $ ./analyze.py runs/
>> Name IOPS Error
>> unpatched-d6550e9ed2 19269820.2 ± 1.36%
>> unpatched-d6550e9ed2-2 19567358.4 ± 2.42%
>> v2-batch-only-f27ed9a4d9 16252227.2 ± 6.09%
>> v2-no-dataplane 14560225.4 ± 5.16%
>> v2-no-dataplane-2 14622535.6 ± 10.08%
>> v2-no-dataplane-3 13960670.8 ± 7.11%
>>
>> unpatched-d6550e9ed2 is without this patch series.
>> v2-batch-only-f27ed9a4d9 is with Patch 1 only. v2-no-dataplane is with
>> the patch series (dataplane is not enabled in any of these tests).
>>
>> Next I will compare unpatched dataplane against patched dataplane. I
>> want to make sure Patch 1 faithfully moved batch notify from dataplane
>> code to generic virtio-blk code without affecting performance.
>>
>> If there is no difference then it means batch notify decreases
>> performance for some workloads (obviously not the same workload that
>> Ming Lei was running).
>
> It turns out that Patch 1 slows down dataplane even though the code
> looks equivalent. After a lot of poking it turned out to be a subtle
> issue:
>
> The order of BHs in the AioContext->first_bh list affects performance.
> Linux AIO (block/linux-aio.c) invokes completion callbacks from a BH.
> Performance is much better if virtio-blk.c's batch BH is after the
> completion BH.
>
> The "fast" ordering notifies the guest in ~300 nanoseconds after the
> last request completion.
>
> The "slow" ordering sometimes takes 100 microseconds after the last
> request completion before the guest is notified. It probably depends on
> whether the event loop is kicked by another source.
>
> I'm thinking of scrapping the batch BH and instead using a notify
> plug/unplug callback to suppress notification until the last request has
> been processed.
>
> I also checked that batch notification does indeed improve performance
> compared to no batching. It offers a nice boost so we do want to port
> the feature from dataplane to non-dataplane.
>
> For the time being: consider this patch series broken due to the
> performance regression.
>
> Stefan
Stefan, could you please share your loads? I tried on my fio scripts and did
not notice any significant difference. Would be interesting to understand
the root cause.
--
Roman