[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [Qemu-block] [PATCH] doc: Preallocation does not requir
From: |
Stefano Garzarella |
Subject: |
Re: [Qemu-devel] [Qemu-block] [PATCH] doc: Preallocation does not require writing zeroes |
Date: |
Mon, 15 Jul 2019 10:01:38 +0200 |
User-agent: |
NeoMutt/20180716 |
On Fri, Jul 12, 2019 at 08:33:14PM +0200, Max Reitz wrote:
> On 12.07.19 12:27, Stefano Garzarella wrote:
> > On Thu, Jul 11, 2019 at 03:29:35PM +0200, Max Reitz wrote:
> >> When preallocating an encrypted qcow2 image, it just lets the protocol
> >> driver write data and then does not mark the clusters as zero.
> >> Therefore, reading this image will yield effectively random data.
> >>
> >> As such, we have not fulfilled the promise of always writing zeroes when
> >> preallocating an image in a while. It seems that nobody has really
> >> cared, so change the documentation to conform to qemu's actual behavior.
> >>
> >> Signed-off-by: Max Reitz <address@hidden>
> >> ---
> >> qapi/block-core.json | 9 +++++----
> >> docs/qemu-block-drivers.texi | 4 ++--
> >> qemu-img.texi | 4 ++--
> >> 3 files changed, 9 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/qapi/block-core.json b/qapi/block-core.json
> >> index 0d43d4f37c..a4363b84d2 100644
> >> --- a/qapi/block-core.json
> >> +++ b/qapi/block-core.json
> >> @@ -5167,10 +5167,11 @@
> >> # @off: no preallocation
> >> # @metadata: preallocate only for metadata
> >> # @falloc: like @full preallocation but allocate disk space by
> >> -# posix_fallocate() rather than writing zeros.
> >> -# @full: preallocate all data by writing zeros to device to ensure disk
> >> -# space is really available. @full preallocation also sets up
> >> -# metadata correctly.
> >> +# posix_fallocate() rather than writing data.
> >> +# @full: preallocate all data by writing it to the device to ensure
> >> +# disk space is really available. This data may or may not be
> >> +# zero, depending on the image format and storage.
> >> +# @full preallocation also sets up metadata correctly.
> >> #
> >> # Since: 2.2
> >> ##
> >> diff --git a/docs/qemu-block-drivers.texi b/docs/qemu-block-drivers.texi
> >> index 91ab0eceae..c02547e28c 100644
> >> --- a/docs/qemu-block-drivers.texi
> >> +++ b/docs/qemu-block-drivers.texi
> >> @@ -31,8 +31,8 @@ Supported options:
> >> @item preallocation
> >> Preallocation mode (allowed values: @code{off}, @code{falloc},
> >> @code{full}).
> >> @code{falloc} mode preallocates space for image by calling
> >> posix_fallocate().
> >> -@code{full} mode preallocates space for image by writing zeros to
> >> underlying
> >> -storage.
> >> +@code{full} mode preallocates space for image by writing data to
> >> underlying
> >> +storage. This data may or may not be zero, depending on the storage
> >> location.
> >> @end table
> >>
> >> @item qcow2
> >> diff --git a/qemu-img.texi b/qemu-img.texi
> >> index c8e9bba515..b5156d6316 100644
> >> --- a/qemu-img.texi
> >> +++ b/qemu-img.texi
> >> @@ -666,8 +666,8 @@ Supported options:
> >> @item preallocation
> >> Preallocation mode (allowed values: @code{off}, @code{falloc},
> >> @code{full}).
> >> @code{falloc} mode preallocates space for image by calling
> >> posix_fallocate().
> >> -@code{full} mode preallocates space for image by writing zeros to
> >> underlying
> >> -storage.
> >> +@code{full} mode preallocates space for image by writing data to
> >> underlying
> >> +storage. This data may or may not be zero, depending on the storage
> >> location.
> >> @end table
> >>
> >> @item qcow2
> >
> > Just a question:
>
> But a very good one, actually.
:-)
>
> > if a protocol driver returns 1 with the .bdrv_has_zero_init callback, is it
> > expected that the preallocation will fill the image with zeroes?
>
> Yes.
>
> > IIUC, for example, the qcow2 returns 1 with the .bdrv_has_zero_init. but
> > during
> > the qcow2_co_truncate() it calls bdrv_co_truncate() and, depending on the
> > backend driver, it should fill the image with zeroes (or not?).
>
> Yes.
>
> > Maybe I miss something related to the metadata...
>
> Well. If the image isn’t preallocated, all will be well because nothing
> of the added range is entered into the metadata, so it returns zero when
> read (unless there is a backing file, but that is handled independently
> of has_zero_init).
Okay, that makes sense.
>
> But you were asking about preallocation. As I wrote in the commit
> message, the qcow2 driver lets the protocol driver preallocate the data
> and then enters it as normal data clusters into its metadata. If the
> image is encrypted, it will appears as random data (or if the protocol
> dirver writes non-zero data). But then it shouldn’t report
> has_zero_init as 1.
>
> Let’s test it.
>
> $ qemu-img create -f qcow2 src.qcow2 64M
> Formatting 'src.qcow2', fmt=qcow2 size=67108864 cluster_size=65536
> lazy_refcounts=off refcount_bits=16
> $ qemu-img convert -O qcow2 \
> -o \
> encrypt.format=luks,encrypt.key-secret=pass,preallocation=metadata \
> --object secret,id=pass,data=123456 \
> src.qcow2 dest.qcow2
> $ qemu-img compare --image-opts \
> file.filename=src.qcow2 \
> file.filename=dest.qcow2,encrypt.key-secret=pass \
> --object secret,id=pass,data=123456
> Content mismatch at offset 0!
>
> Oops.
>
>
> So. We can do two things here.
>
> (A) We drop this patch and actually make sure that preallocation always
> writes zeroes, and if that cannot be done efficiently, then too bad.
>
> Note that for qcow2, we cannot just mark all clusters as zero clusters
> because that would kind of defeat the purpose of metadata preallocation.
> One of its main purposes is to prevent COW when you write to a new
> image, i.e. that the qcow2 driver needs to do a read-modify-write cycle
> just to zero a new cluster. If we kept preallocating potentially random
> data and hooked it up as preallocated zero clusters, those RMW cycles
> would remain, thus defeating the point of metadata preallocation. So
> even for qcow2, if there is a chance that the data stored is not zero,
> it needs to explicitly store zeroes then.
>
> But the good thing here is that the protocol driver would always
> guarantee that it preallocates pure zeroes.
>
> The bad thing is that I don’t think we could support pure metadata or
> falloc preallocation together with encryption any longer, which would
> definitely be an incompatible change. Well, because we wouldn’t want to
> break this support, I suppose we would in this case (encryption) resort
> to linking the data clusters as preallocated zero clusters. Which is
> bad because of RMW. but well. That’s life.
Yes, I agree that it is a bad thing.
>
>
> (B) We keep this patch and audit our use of bdrv_has_zero_init(). So
> for qcow2, we need to return 0 for encrypted images. That is suboptimal
> for non-preallocated encrypted images, but again, that’s life.
>
> Also, we need to return 0 if the protocol layer does not preallocate the
> data to be zero -- which we can see from its setting of
> bdrv_has_zero_init(), I’d suppose. So for preallocated unencrypted
> images, qcow2 would need to return bdrv_has_zero_init(s->data_file->bs).
Yes, that's what I had in mind, too.
>
> But here’s the problem: How do we know whether an image has been
> preallocated or not? And that is a problem. I don’t know a solution
> off the top of my head. We could add a parameter to
> bdrv_has_zero_init() for that, but what would blockdev-mirror assume?
> It doesn’t know.
Why we need to know if an image has been preallocated with zeroes or not?
Maybe when we convert it, but I'm not sure.
>
> Hm. I suppose qcow2’s implementation could sift through its L2 tables
> and if there are any links to data clusters, it is by definition in some
> way preallocated.
>
>
> So I suppose B seems like the better solution if there is a way for all
> format drivers to determine whether their image has been preallocated or
> not?
Seems like the better solution to me, also if I miss something why we need to
determine whether or not it's been pre-allocated. :-(
Thanks for very detailed answer,
Stefano