[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCHv4] block: optimize zero writes with bdrv_write_z
From: |
Peter Lieven |
Subject: |
Re: [Qemu-devel] [PATCHv4] block: optimize zero writes with bdrv_write_zeroes |
Date: |
Thu, 15 May 2014 23:20:25 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 |
Am 15.05.2014 11:54, schrieb Kevin Wolf:
> Am 15.05.2014 um 07:16 hat Peter Lieven geschrieben:
>> Am 14.05.2014 13:41, schrieb Kevin Wolf:
>>> Am 08.05.2014 um 18:22 hat Peter Lieven geschrieben:
>>>> this patch tries to optimize zero write requests
>>>> by automatically using bdrv_write_zeroes if it is
>>>> supported by the format.
>>>>
>>>> This significantly speeds up file system initialization and
>>>> should speed zero write test used to test backend storage
>>>> performance.
>>>>
>>>> I ran the following 2 tests on my internal SSD with a
>>>> 50G QCOW2 container and on an attached iSCSI storage.
>>>>
>>>> a) mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/vdX
>>>>
>>>> QCOW2 [off] [on] [unmap]
>>>> -----
>>>> runtime: 14secs 1.1secs 1.1secs
>>>> filesize: 937M 18M 18M
>>>>
>>>> iSCSI [off] [on] [unmap]
>>>> ----
>>>> runtime: 9.3s 0.9s 0.9s
>>>>
>>>> b) dd if=/dev/zero of=/dev/vdX bs=1M oflag=direct
>>>>
>>>> QCOW2 [off] [on] [unmap]
>>>> -----
>>>> runtime: 246secs 18secs 18secs
>>>> filesize: 51G 192K 192K
>>>> throughput: 203M/s 2.3G/s 2.3G/s
>>>>
>>>> iSCSI* [off] [on] [unmap]
>>>> ----
>>>> runtime: 8mins 45secs 33secs
>>>> throughput: 106M/s 1.2G/s 1.6G/s
>>>> allocated: 100% 100% 0%
>>>>
>>>> * The storage was connected via an 1Gbit interface.
>>>> It seems to internally handle writing zeroes
>>>> via WRITESAME16 very fast.
>>>>
>>>> Signed-off-by: Peter Lieven <address@hidden>
>>>> ---
>>>> v3->v4: - use QAPI generated enum and lookup table [Kevin]
>>>> - added more details about the options in the comments
>>>> of the qapi-schema [Eric]
>>>> - changed the type of detect_zeroes from str to
>>>> BlockdevDetectZeroesOptions. I left the name
>>>> as is because it is consistent with e.g.
>>>> BlockdevDiscardOptions or BlockdevAioOptions [Eric]
>>>> - changed the parse function in blockdev_init to
>>>> be generic usable for other enum parameters
>>>>
>>>> v2->v3: - moved parameter parsing to blockdev_init
>>>> - added per device detect_zeroes status to
>>>> hmp (info block -v) and qmp (query-block) [Eric]
>>>> - added support to enable detect-zeroes also
>>>> for hot added devices [Eric]
>>>> - added missing entry to qemu_common_drive_opts
>>>> - fixed description of qemu_iovec_is_zero [Fam]
>>>>
>>>> v1->v2: - added tests to commit message (Markus)
>>>> RFCv2->v1: - fixed paramter parsing strncmp -> strcmp (Eric)
>>>> - fixed typo (choosen->chosen) (Eric)
>>>> - added second example to commit msg
>>>>
>>>> RFCv1->RFCv2: - add detect-zeroes=off|on|unmap knob to drive cmdline
>>>> parameter
>>>> - call zero detection only for format (bs->file != NULL)
>>>>
>>>> block.c | 11 ++++++++++
>>>> block/qapi.c | 1 +
>>>> blockdev.c | 34 +++++++++++++++++++++++++++++
>>>> hmp.c | 5 +++++
>>>> include/block/block_int.h | 1 +
>>>> include/qemu-common.h | 1 +
>>>> qapi-schema.json | 52
>>>> ++++++++++++++++++++++++++++++++-------------
>>>> qemu-options.hx | 6 ++++++
>>>> qmp-commands.hx | 3 +++
>>>> util/iov.c | 21 ++++++++++++++++++
>>>> 10 files changed, 120 insertions(+), 15 deletions(-)
>>>>
>>>> diff --git a/block.c b/block.c
>>>> index b749d31..aea4c77 100644
>>>> --- a/block.c
>>>> +++ b/block.c
>>>> @@ -3244,6 +3244,17 @@ static int coroutine_fn
>>>> bdrv_aligned_pwritev(BlockDriverState *bs,
>>>>
>>>> ret = notifier_with_return_list_notify(&bs->before_write_notifiers,
>>>> req);
>>>>
>>>> + if (!ret && bs->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF &&
>>>> + !(flags & BDRV_REQ_ZERO_WRITE) && bs->file &&
>>>> + drv->bdrv_co_write_zeroes && qemu_iovec_is_zero(qiov)) {
>>> Pretty long condition. :-)
>>>
>>> Looks like most is obviously needed, but I wonder what the bs->file part
>>> is good for? That looks rather arbitrary.
>> What I wanted to achieve is that this condition is only true if we handle
>> the format (e.g. raw, qcow2, vmdk etc.). If e.g. qcow2 then sends a
>> zero write this should always end on the disk and should not be optimizable.
> But why? This means setting an arbitrary policy for no good reason. You
> already have an option, and it already defaults to off, so unless
> someone specifically enables it for bs->file, we don't do the
> optimisation. But if someone wants to have it on bs->file, what reason
> is there to ignore that request?
I was erroneously thinking that the setting would be inherited to bs->file.
I will remove the bs->file from the condition.
>
>>>> + flags |= BDRV_REQ_ZERO_WRITE;
>>>> + /* if the device was not opened with discard=on the below flag
>>>> + * is immediately cleared again in bdrv_co_do_write_zeroes */
>>> Is it? I only see it being cleared in bdrv_co_write_zeroes(), but that's
>>> not a function that seems to be called from here.
>> You are right. Question, do we want to support detect_zeroes = unmap
>> if discard = ignore? If yes, I have to update the docs. Otherwise
>> I have to check for BDRV_O_DISCARD before setting BDRV_REQ_MAY_UNMAP.
> I think it would be reasonable enough to just error out when you try to
> open an image with detect_zeroes=unmap,discard=ignore.
>
> Can these flags be changed during runtime? If so, we need to check there
> as well.
No, but you can specify them during hot add. Same as with discard.
Peter
>
>>>> + if (bs->detect_zeroes == BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP) {
>>>> + flags |= BDRV_REQ_MAY_UNMAP;
>>>> + }
>>>> + }
>>>> +
>>>> if (ret < 0) {
>>>> /* Do nothing, write notifier decided to fail this request */
>>>> } else if (flags & BDRV_REQ_ZERO_WRITE) {
> Kevin