qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RFC 0/3] hw/block/nvme: dif-based end-to-end data protection


From: Klaus Jensen
Subject: Re: [PATCH RFC 0/3] hw/block/nvme: dif-based end-to-end data protection support
Date: Fri, 18 Dec 2020 10:39:01 +0100

On Dec 17 13:14, Keith Busch wrote:
> On Thu, Dec 17, 2020 at 10:02:19PM +0100, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> > 
> > This series adds support for extended LBAs and end-to-end data
> > protection. Marked RFC, since there are a bunch of issues that could use
> > some discussion.
> > 
> > Storing metadata bytes contiguously with the logical block data and
> > creating a physically extended logical block basically breaks the DULBE
> > and deallocation support I just added. Formatting a namespace with
> > protection information requires the app- and reftags of deallocated or
> > unwritten blocks to be 0xffff and 0xffffffff respectively; this could be
> > used to reintroduce DULBE support in that case, albeit at a somewhat
> > higher cost than the block status flag-based approach.
> > 
> > There is basically three ways of storing metadata (and maybe a forth,
> > but that is probably quite the endeavour):
> > 
> >   1. Storing metadata as extended blocks directly on the blockdev. That
> >      is the approach used in this RFC.
> > 
> >   2. Use a separate blockdev. Incidentially, this is also the easiest
> >      and most straightforward solution to support MPTR-based "separate
> >      metadata". This also allows DULBE and block deallocation to be
> >      supported using the existing approach.
> > 
> >   3. A hybrid of 1 and 2 where the metadata is stored contiguously at
> >     the end of the nvme-ns blockdev.
> > 
> > Option 1 obviously works well with DIF-based protection information and
> > extended LBAs since it maps one to one. Option 2 works flawlessly with
> > MPTR-based metadata, but extended LBAs can be "emulated" at the cost of
> > a bunch of scatter/gather operations.
> 
> Are there any actual users of extended metadata that we care about? I'm
> aware of only a few niche places that can even access an extended
> metadata format. There's not kernel support in any major OS that I know
> of.
> 

Yes, there are definitely actual users in enterprise storage. But the
main use case here is testing (using extended LBAs with SPDK for
instance).

> Option 2 sounds fine.
> 
> If option 3 means that you're still using MPTR, but just sequester space
> at the end of the backing block device for meta-data purposes, then that
> is fine too. You can even resize it dynamically if you want to support
> different metadata sizes.

Heh, I tend to think that my English vocabulary is pretty decent but I
had to look up 'sequester'. I just learned a new word today \o/

Yes. I actually also like option 3. Technically option 2 does not break
image interoperability between devices (ide, virtio), but you would
leave out a bunch of metadata that your application might depend on, so
I don't see any way to not break interoperability really.

And I would then be just fine with "emulating" extended LBAs at the cost
of more I/O. Because I would like the device to support that mode of
operation as well. We have not implemented this, but my gut feeling says
that it can be done.

> 
> > The 4th option is extending an existing image format (QCOW2) or create
> > something on top of RAW to supports metadata bytes per block. But both
> > approaches require full API support through the block layer. And
> > probably a lot of other stuff that I did not think about.
> 
> It definitely sounds appealing to push the feature to a lower level if
> you're really willing to see that through.
> 

Yes, its super appealing and I would like to have some input from the
block layer guys on this. That is, if anyone has ever explored it?

> In any case, calculating T10 CRCs is *really* slow unless you have
> special hardware and software support for it.
> 

Yeah. I know this is super slow. For for emulation and testing purposes
I think it is a nice feature for the device to optionally offer.

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]