qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 1/1] NBD proto: add WRITE_ZEROES extension


From: Alex Bligh
Subject: Re: [Qemu-devel] [PATCH v2 1/1] NBD proto: add WRITE_ZEROES extension
Date: Thu, 31 Mar 2016 14:53:22 +0100

On 31 Mar 2016, at 14:02, Denis V. Lunev <address@hidden> wrote:

> From: Pavel Borzenkov <address@hidden>
> 
> There exist some cases when a client knows that the data it is going to
> write is all zeroes. Such cases include mirroring or backing up a device
> implemented by a sparse file.

Useful.

> -- bit 0, `NBD_CMD_FLAG_FUA`; valid during `NBD_CMD_WRITE`.  SHOULD be
> -  set to 1 if the client requires "Force Unit Access" mode of
> -  operation.  MUST NOT be set unless transmission flags included
> -  `NBD_FLAG_SEND_FUA`.
> +- bit 0, `NBD_CMD_FLAG_FUA`; valid during `NBD_CMD_WRITE` and
> +  `NBD_CMD_WRITE_ZEROES` commands.  SHOULD be set to 1 if the client requires
> +  "Force Unit Access" mode of operation.  MUST NOT be set unless transmission
> +  flags included `NBD_FLAG_SEND_FUA`.

Not your fault, but this should actually say "unless export flags
included". Transmission flags would be the flags with the command.

> +- bit 1, `NBD_CMD_MAY_TRIM`; defined by the experimental `WRITE_ZEROES`
> +  extension; see below.

For consistency, probably useful to say here:

MUST NOT be set unless the export flags include NBD_FLAG_SEND_WRITE_ZEROES.

> 
> #### Request types
> 
> @@ -523,6 +528,10 @@ The following request types exist:
>     A client MUST NOT send a trim request unless `NBD_FLAG_SEND_TRIM`
>     was set in the transmission flags field.
> 
> +* `NBD_CMD_WRITE_ZEROES` (6)
> +
> +    Defined by the experimental `WRITE_ZEROES` extension; see below.
> +
> * Other requests
> 
>     Some third-party implementations may require additional protocol
> @@ -654,6 +663,53 @@ option reply type.
>       message if they do not also send it as a reply to the
>       `NBD_OPT_SELECT` message.
> 
> +### `WRITE_ZEROES` extension
> +
> +There exist some cases when a client knows that the data it is going to write
> +is all zeroes. Such cases include mirroring or backing up a device 
> implemented
> +by a sparse file. With current NBD command set, the client has to issue
> +`NBD_CMD_WRITE` command with zeroed payload and transfer these zero bytes
> +through the wire. The server has to write the data onto disk, effectively
> +losing the sparseness.
> +
> +To remedy this, a `WRITE_ZEROES` extension is envisioned. This extension adds
> +one new command and one new command flag.
> +
> +* `NBD_CMD_WRITE_ZEROES` (6)
> +
> +    A write request with no payload. Length and offset define the location
> +    and amount of data to be zeroed.
> +
> +    The server MUST zero out the data on disk, and then send the reply
> +    message. The server MAY send the reply message before the data has
> +    reached permanent storage.
> +
> +    A client MUST NOT send a write zeroes request unless
> +    `NBD_FLAG_SEND_WRITE_ZEROES` was set in the transmission flags field.
> +
> +    If the `NBD_FLAG_SEND_FUA` flag was set in the transmission flags field,
> +    the client MAY set the flag `NBD_CMD_FLAG_FUA` in the command flags 
> field.
> +    If this flag was set, the server MUST NOT send the reply until it has
> +    ensured that the newly-zeroed data has reached permanent storage.
> +
> +    If the flag `NBD_CMD_FLAG_MAY_TRIM` was set by the client in the command
> +    flags field, the server MAY use trimming to zero out the area, but it
> +    MUST ensure that the data reads back as zero.
> +

Can you give an example of a situation where the client would not set this
and it would be undesirable for the server to create a 'hole' using
'trim' type technology, even when the client doesn't specify it?
I suspect there are already some backends (e.g. ceph on qemu-nbd) which
will effectively do a 'trim' if you write 4k of zeroes even under
current circumstances.

IE why not always permit trimming PROVIDED the data always reads back
as zero? This would be far simpler.

-- 
Alex Bligh







reply via email to

[Prev in Thread] Current Thread [Next in Thread]