[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH RFC 0/3] hw/block/nvme: dif-based end-to-end data protection
From: |
Keith Busch |
Subject: |
Re: [PATCH RFC 0/3] hw/block/nvme: dif-based end-to-end data protection support |
Date: |
Thu, 17 Dec 2020 13:14:40 -0800 |
On Thu, Dec 17, 2020 at 10:02:19PM +0100, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
>
> This series adds support for extended LBAs and end-to-end data
> protection. Marked RFC, since there are a bunch of issues that could use
> some discussion.
>
> Storing metadata bytes contiguously with the logical block data and
> creating a physically extended logical block basically breaks the DULBE
> and deallocation support I just added. Formatting a namespace with
> protection information requires the app- and reftags of deallocated or
> unwritten blocks to be 0xffff and 0xffffffff respectively; this could be
> used to reintroduce DULBE support in that case, albeit at a somewhat
> higher cost than the block status flag-based approach.
>
> There is basically three ways of storing metadata (and maybe a forth,
> but that is probably quite the endeavour):
>
> 1. Storing metadata as extended blocks directly on the blockdev. That
> is the approach used in this RFC.
>
> 2. Use a separate blockdev. Incidentially, this is also the easiest
> and most straightforward solution to support MPTR-based "separate
> metadata". This also allows DULBE and block deallocation to be
> supported using the existing approach.
>
> 3. A hybrid of 1 and 2 where the metadata is stored contiguously at
> the end of the nvme-ns blockdev.
>
> Option 1 obviously works well with DIF-based protection information and
> extended LBAs since it maps one to one. Option 2 works flawlessly with
> MPTR-based metadata, but extended LBAs can be "emulated" at the cost of
> a bunch of scatter/gather operations.
Are there any actual users of extended metadata that we care about? I'm
aware of only a few niche places that can even access an extended
metadata format. There's not kernel support in any major OS that I know
of.
Option 2 sounds fine.
If option 3 means that you're still using MPTR, but just sequester space
at the end of the backing block device for meta-data purposes, then that
is fine too. You can even resize it dynamically if you want to support
different metadata sizes.
> The 4th option is extending an existing image format (QCOW2) or create
> something on top of RAW to supports metadata bytes per block. But both
> approaches require full API support through the block layer. And
> probably a lot of other stuff that I did not think about.
It definitely sounds appealing to push the feature to a lower level if
you're really willing to see that through.
In any case, calculating T10 CRCs is *really* slow unless you have
special hardware and software support for it.