[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH V4 5/6] block/rbd: add write zeroes support
From: |
Ilya Dryomov |
Subject: |
Re: [PATCH V4 5/6] block/rbd: add write zeroes support |
Date: |
Fri, 2 Jul 2021 14:24:27 +0200 |
On Fri, Jul 2, 2021 at 11:09 AM Peter Lieven <pl@kamp.de> wrote:
>
> this patch wittingly sets BDRV_REQ_NO_FALLBACK and silently ignores
> BDRV_REQ_MAY_UNMAP
> for older librbd versions.
>
> The rationale for this is as following (citing Ilya Dryomov current RBD
> maintainer):
> ---8<---
> a) remove the BDRV_REQ_MAY_UNMAP check in qemu_rbd_co_pwrite_zeroes()
> and as a consequence always unmap if librbd is too old
>
> It's not clear what qemu's expectation is but in general Write
> Zeroes is allowed to unmap. The only guarantee is that subsequent
> reads return zeroes, everything else is a hint. This is how it is
> specified in the kernel and in the NVMe spec.
>
> In particular, block/nvme.c implements it as follows:
>
> if (flags & BDRV_REQ_MAY_UNMAP) {
> cdw12 |= (1 << 25);
> }
>
> This sets the Deallocate bit. But if it's not set, the device may
> still deallocate:
>
> """
> If the Deallocate bit (CDW12.DEAC) is set to '1' in a Write Zeroes
> command, and the namespace supports clearing all bytes to 0h in the
> values read (e.g., bits 2:0 in the DLFEAT field are set to 001b)
> from a deallocated logical block and its metadata (excluding
> protection information), then for each specified logical block, the
> controller:
> - should deallocate that logical block;
>
> ...
>
> If the Deallocate bit is cleared to '0' in a Write Zeroes command,
> and the namespace supports clearing all bytes to 0h in the values
> read (e.g., bits 2:0 in the DLFEAT field are set to 001b) from
> a deallocated logical block and its metadata (excluding protection
> information), then, for each specified logical block, the
> controller:
> - may deallocate that logical block;
> """
>
>
> https://nvmexpress.org/wp-content/uploads/NVM-Express-NVM-Command-Set-Specification-2021.06.02-Ratified-1.pdf
>
> b) set BDRV_REQ_NO_FALLBACK in supported_zero_flags
>
> Again, it's not clear what qemu expects here, but without it we end
> up in a ridiculous situation where specifying the "don't allow slow
> fallback" switch immediately fails all efficient zeroing requests on
> a device where Write Zeroes is always efficient:
>
> $ qemu-io -c 'help write' | grep -- '-[zun]'
> -n, -- with -z, don't allow slow fallback
> -u, -- with -z, allow unmapping
> -z, -- write zeroes using blk_co_pwrite_zeroes
>
> $ qemu-io -f rbd -c 'write -z -u -n 0 1M' rbd:foo/bar
> write failed: Operation not supported
> --->8---
>
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
> block/rbd.c | 32 +++++++++++++++++++++++++++++++-
> 1 file changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/block/rbd.c b/block/rbd.c
> index be0471944a..149317d33c 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -63,7 +63,8 @@ typedef enum {
> RBD_AIO_READ,
> RBD_AIO_WRITE,
> RBD_AIO_DISCARD,
> - RBD_AIO_FLUSH
> + RBD_AIO_FLUSH,
> + RBD_AIO_WRITE_ZEROES
> } RBDAIOCmd;
>
> typedef struct BDRVRBDState {
> @@ -705,6 +706,10 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict
> *options, int flags,
> }
> }
>
> +#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
> + bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK;
> +#endif
> +
> /* When extending regular files, we get zeros from the OS */
> bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
>
> @@ -827,6 +832,18 @@ static int coroutine_fn
> qemu_rbd_start_co(BlockDriverState *bs,
> case RBD_AIO_FLUSH:
> r = rbd_aio_flush(s->image, c);
> break;
> +#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
> + case RBD_AIO_WRITE_ZEROES: {
> + int zero_flags = 0;
> +#ifdef RBD_WRITE_ZEROES_FLAG_THICK_PROVISION
> + if (!(flags & BDRV_REQ_MAY_UNMAP)) {
> + zero_flags = RBD_WRITE_ZEROES_FLAG_THICK_PROVISION;
> + }
> +#endif
> + r = rbd_aio_write_zeroes(s->image, offset, bytes, c, zero_flags, 0);
> + break;
> + }
> +#endif
> default:
> r = -EINVAL;
> }
> @@ -897,6 +914,16 @@ static int coroutine_fn
> qemu_rbd_co_pdiscard(BlockDriverState *bs,
> return qemu_rbd_start_co(bs, offset, count, NULL, 0, RBD_AIO_DISCARD);
> }
>
> +#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
> +static int
> +coroutine_fn qemu_rbd_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
> + int count, BdrvRequestFlags flags)
> +{
> + return qemu_rbd_start_co(bs, offset, count, NULL, flags,
> + RBD_AIO_WRITE_ZEROES);
> +}
> +#endif
> +
> static int qemu_rbd_getinfo(BlockDriverState *bs, BlockDriverInfo *bdi)
> {
> BDRVRBDState *s = bs->opaque;
> @@ -1120,6 +1147,9 @@ static BlockDriver bdrv_rbd = {
> .bdrv_co_pwritev = qemu_rbd_co_pwritev,
> .bdrv_co_flush_to_disk = qemu_rbd_co_flush,
> .bdrv_co_pdiscard = qemu_rbd_co_pdiscard,
> +#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
> + .bdrv_co_pwrite_zeroes = qemu_rbd_co_pwrite_zeroes,
> +#endif
>
> .bdrv_snapshot_create = qemu_rbd_snap_create,
> .bdrv_snapshot_delete = qemu_rbd_snap_remove,
> --
> 2.17.1
>
>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Thanks,
Ilya
- [PATCH V4 0/6] block/rbd: migrate to coroutines and add write zeroes support, Peter Lieven, 2021/07/02
- [PATCH V4 3/6] block/rbd: update s->image_size in qemu_rbd_getlength, Peter Lieven, 2021/07/02
- [PATCH V4 1/6] block/rbd: bump librbd requirement to luminous release, Peter Lieven, 2021/07/02
- [PATCH V4 4/6] block/rbd: migrate from aio to coroutines, Peter Lieven, 2021/07/02
- [PATCH V4 2/6] block/rbd: store object_size in BDRVRBDState, Peter Lieven, 2021/07/02
- [PATCH V4 6/6] block/rbd: drop qemu_rbd_refresh_limits, Peter Lieven, 2021/07/02
- [PATCH V4 5/6] block/rbd: add write zeroes support, Peter Lieven, 2021/07/02
- Re: [PATCH V4 5/6] block/rbd: add write zeroes support,
Ilya Dryomov <=
- Re: [PATCH V4 0/6] block/rbd: migrate to coroutines and add write zeroes support, Ilya Dryomov, 2021/07/02