[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 04/11] block: Add new bdrv_co_is_all_zeroes() function
From: |
Stefan Hajnoczi |
Subject: |
Re: [PATCH v2 04/11] block: Add new bdrv_co_is_all_zeroes() function |
Date: |
Thu, 17 Apr 2025 16:35:33 -0400 |
On Thu, Apr 17, 2025 at 01:39:09PM -0500, Eric Blake wrote:
> There are some optimizations that require knowing if an image starts
> out as reading all zeroes, such as making blockdev-mirror faster by
> skipping the copying of source zeroes to the destination. The
> existing bdrv_co_is_zero_fast() is a good building block for answering
> this question, but it tends to give an answer of 0 for a file we just
> created via QMP 'blockdev-create' or similar (such as 'qemu-img create
> -f raw'). Why? Because file-posix.c insists on allocating a tiny
> header to any file rather than leaving it 100% sparse, due to some
> filesystems that are unable to answer alignment probes on a hole. But
> teaching file-posix.c to read the tiny header doesn't scale - the
> problem of a small header is also visible when libvirt sets up an NBD
> client to a just-created file on a migration destination host.
>
> So, we need a wrapper function that handles a bit more complexity in a
> common manner for all block devices - when the BDS is mostly a hole,
> but has a small non-hole header, it is still worth the time to read
> that header and check if it reads as all zeroes before giving up and
> returning a pessimistic answer.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> include/block/block-io.h | 2 ++
> block/io.c | 58 ++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 60 insertions(+)
>
> diff --git a/include/block/block-io.h b/include/block/block-io.h
> index b49e0537dd4..b99cc98d265 100644
> --- a/include/block/block-io.h
> +++ b/include/block/block-io.h
> @@ -161,6 +161,8 @@ bdrv_is_allocated_above(BlockDriverState *bs,
> BlockDriverState *base,
>
> int coroutine_fn GRAPH_RDLOCK
> bdrv_co_is_zero_fast(BlockDriverState *bs, int64_t offset, int64_t bytes);
> +int coroutine_fn GRAPH_RDLOCK
> +bdrv_co_is_all_zeroes(BlockDriverState *bs);
>
> int GRAPH_RDLOCK
> bdrv_apply_auto_read_only(BlockDriverState *bs, const char *errmsg,
> diff --git a/block/io.c b/block/io.c
> index 6ef78070915..dc1341e4029 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2778,6 +2778,64 @@ int coroutine_fn bdrv_co_is_zero_fast(BlockDriverState
> *bs, int64_t offset,
> return 1;
> }
>
> +/*
> + * Check @bs (and its backing chain) to see if the entire image is known
> + * to read as zeroes.
> + * Return 1 if that is the case, 0 otherwise and -errno on error.
> + * This test is meant to be fast rather than accurate so returning 0
> + * does not guarantee non-zero data; however, it can report 1 in more
False negatives are possible, let's also document that false positives
are not possible:
This test is mean to be fast rather than accurate so returning 0 does
not guarantee non-zero data, but returning 1 does guarantee all zero
data; ...
> + * cases than bdrv_co_is_zero_fast.
> + */
> +int coroutine_fn bdrv_co_is_all_zeroes(BlockDriverState *bs)
> +{
> + int ret;
> + int64_t pnum, bytes;
> + char *buf;
> + QEMUIOVector local_qiov;
> + IO_CODE();
> +
> + bytes = bdrv_co_getlength(bs);
> + if (bytes < 0) {
> + return bytes;
> + }
> +
> + /* First probe - see if the entire image reads as zero */
> + ret = bdrv_co_common_block_status_above(bs, NULL, false, BDRV_BSTAT_ZERO,
> + 0, bytes, &pnum, NULL, NULL,
> + NULL);
> + if (ret < 0) {
> + return ret;
> + }
> + if (ret & BDRV_BLOCK_ZERO) {
> + return bdrv_co_is_zero_fast(bs, pnum, bytes - pnum);
> + }
> +
> + /*
> + * Because of the way 'blockdev-create' works, raw files tend to
> + * be created with a non-sparse region at the front to make
> + * alignment probing easier. If the block starts with only a
> + * small allocated region, it is still worth the effort to see if
> + * the rest of the image is still sparse, coupled with manually
> + * reading the first region to see if it reads zero after all.
> + */
> + if (pnum > qemu_real_host_page_size()) {
Probably not worth it for the corner case, but replacing
qemu_real_host_page_size() with 128 KiB would allow this to work on
images created on different CPU architectures (4 KiB vs 64 KiB page
sizes).
> + return 0;
> + }
> + ret = bdrv_co_is_zero_fast(bs, pnum, bytes - pnum);
> + if (ret <= 0) {
> + return ret;
> + }
> + /* Only the head of the image is unknown, and it's small. Read it. */
> + buf = qemu_blockalign(bs, pnum);
> + qemu_iovec_init_buf(&local_qiov, buf, pnum);
> + ret = bdrv_driver_preadv(bs, 0, pnum, &local_qiov, 0, 0);
> + if (ret >= 0) {
> + ret = buffer_is_zero(buf, pnum);
> + }
> + qemu_vfree(buf);
> + return ret;
> +}
> +
> int coroutine_fn bdrv_co_is_allocated(BlockDriverState *bs, int64_t offset,
> int64_t bytes, int64_t *pnum)
> {
> --
> 2.49.0
>
>
signature.asc
Description: PGP signature