qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH for 2.7 1/1] qcow2: improve qcow2_co_write_zeroe


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH for 2.7 1/1] qcow2: improve qcow2_co_write_zeroes()
Date: Mon, 25 Apr 2016 11:05:53 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Am 23.04.2016 um 14:05 hat Denis V. Lunev geschrieben:
> Unfortunately Linux kernel could send non-aligned requests to qemu-nbd
> if the caller is using O_DIRECT and does not align in-memory data to
> page. Thus qemu-nbd will call block layer with non-aligned requests.
> 
> qcow2_co_write_zeroes forcibly asks the caller to supply block-aligned
> data. In the other case it rejects with ENOTSUP which is properly
> handled on the upper level. The problem is that this grows the image.
> 
> This could be optimized a bit:
> - particular request could be split to block aligned part and head/tail,
>   which could be handled separately

In fact, this is what bdrv_co_do_write_zeroes() is already supposed to
do. qcow2 exposes its cluster size as bs->bl.write_zeroes_alignment, so
block/io.c should split the request in three.

If you see something different happening, we may have a bug there.

> - writes could be omitted when we do know that the image already contains
>   zeroes at the offsets being written

I don't think this is a valid shortcut. The semantics of a write_zeroes
operation is that the zeros (literal or as flags) are stored in this
layer and that the backing file isn't involved any more for the given
sectors. For example, a streaming operation or qemu-img rebase may
involve write_zeroes operations, and relying on the backing file would
cause corruption there (because the whole point of the operation is that
the backing file can be removed).

And to be honest, writing zero flags in memory to the cached L2 table is
an operation so quick that calling bdrv_get_block_status_above() might
actually end up slower in most cases than just setting the flag.

Kevin

> Signed-off-by: Denis V. Lunev <address@hidden>
> CC: Kevin Wolf <address@hidden>
> CC: Max Reitz <address@hidden>
> ---
>  block/qcow2.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 53 insertions(+), 5 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 470734b..9bdaa15 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -2411,21 +2411,69 @@ finish:
>      return ret;
>  }
>  
> +static int write_zeroes_chunk(BlockDriverState *bs, int64_t sector_num, int 
> nr)
> +{
> +    int ret, count;
> +    BlockDriverState *file;
> +    uint8_t *buf;
> +    struct iovec iov;
> +    QEMUIOVector local_qiov;
> +
> +    ret = bdrv_get_block_status_above(bs, NULL, sector_num, nr, &count, 
> &file);
> +    if (ret > 0 && (ret & BDRV_BLOCK_ZERO) && count == nr) {
> +        /* Nothing to do. The area is zeroed already.
> +           Worth to check to avoid image expansion for non-aligned reqs. */
> +        return 0;
> +    }
> +
> +    buf = qemu_blockalign0(bs, nr << BDRV_SECTOR_BITS);
> +    iov = (struct iovec) {
> +        .iov_base   = buf,
> +        .iov_len    = nr << BDRV_SECTOR_BITS,
> +    };
> +    qemu_iovec_init_external(&local_qiov, &iov, 1);
> +
> +    return qcow2_co_writev(bs, sector_num, nr, &local_qiov);
> +}
> +
>  static coroutine_fn int qcow2_co_write_zeroes(BlockDriverState *bs,
>      int64_t sector_num, int nb_sectors, BdrvRequestFlags flags)
>  {
>      int ret;
>      BDRVQcow2State *s = bs->opaque;
>  
> -    /* Emulate misaligned zero writes */
> -    if (sector_num % s->cluster_sectors || nb_sectors % s->cluster_sectors) {
> -        return -ENOTSUP;
> +    int nr = sector_num % s->cluster_sectors;
> +    if (nr != 0) {
> +        nr = MIN(s->cluster_sectors - nr, nb_sectors);
> +
> +        ret = write_zeroes_chunk(bs, sector_num, nr);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +
> +        sector_num += nr;
> +        nb_sectors -= nr;
> +        if (nb_sectors == 0) {
> +            return 0;
> +        }
> +    }
> +
> +    nr = nb_sectors % s->cluster_sectors;
> +    if (nr != 0) {
> +        ret = write_zeroes_chunk(bs, sector_num + nb_sectors - nr, nr);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +
> +        nb_sectors -= nr;
> +        if (nb_sectors == 0) {
> +            return 0;
> +        }
>      }
>  
>      /* Whatever is left can use real zero clusters */
>      qemu_co_mutex_lock(&s->lock);
> -    ret = qcow2_zero_clusters(bs, sector_num << BDRV_SECTOR_BITS,
> -        nb_sectors);
> +    ret = qcow2_zero_clusters(bs, sector_num << BDRV_SECTOR_BITS, 
> nb_sectors);
>      qemu_co_mutex_unlock(&s->lock);
>  
>      return ret;
> -- 
> 2.5.0
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]