qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-block] [PATCH] doc: Preallocation does not requir


From: Stefano Garzarella
Subject: Re: [Qemu-devel] [Qemu-block] [PATCH] doc: Preallocation does not require writing zeroes
Date: Mon, 15 Jul 2019 10:01:38 +0200
User-agent: NeoMutt/20180716

On Fri, Jul 12, 2019 at 08:33:14PM +0200, Max Reitz wrote:
> On 12.07.19 12:27, Stefano Garzarella wrote:
> > On Thu, Jul 11, 2019 at 03:29:35PM +0200, Max Reitz wrote:
> >> When preallocating an encrypted qcow2 image, it just lets the protocol
> >> driver write data and then does not mark the clusters as zero.
> >> Therefore, reading this image will yield effectively random data.
> >>
> >> As such, we have not fulfilled the promise of always writing zeroes when
> >> preallocating an image in a while.  It seems that nobody has really
> >> cared, so change the documentation to conform to qemu's actual behavior.
> >>
> >> Signed-off-by: Max Reitz <address@hidden>
> >> ---
> >>  qapi/block-core.json         | 9 +++++----
> >>  docs/qemu-block-drivers.texi | 4 ++--
> >>  qemu-img.texi                | 4 ++--
> >>  3 files changed, 9 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/qapi/block-core.json b/qapi/block-core.json
> >> index 0d43d4f37c..a4363b84d2 100644
> >> --- a/qapi/block-core.json
> >> +++ b/qapi/block-core.json
> >> @@ -5167,10 +5167,11 @@
> >>  # @off: no preallocation
> >>  # @metadata: preallocate only for metadata
> >>  # @falloc: like @full preallocation but allocate disk space by
> >> -#          posix_fallocate() rather than writing zeros.
> >> -# @full: preallocate all data by writing zeros to device to ensure disk
> >> -#        space is really available. @full preallocation also sets up
> >> -#        metadata correctly.
> >> +#          posix_fallocate() rather than writing data.
> >> +# @full: preallocate all data by writing it to the device to ensure
> >> +#        disk space is really available. This data may or may not be
> >> +#        zero, depending on the image format and storage.
> >> +#        @full preallocation also sets up metadata correctly.
> >>  #
> >>  # Since: 2.2
> >>  ##
> >> diff --git a/docs/qemu-block-drivers.texi b/docs/qemu-block-drivers.texi
> >> index 91ab0eceae..c02547e28c 100644
> >> --- a/docs/qemu-block-drivers.texi
> >> +++ b/docs/qemu-block-drivers.texi
> >> @@ -31,8 +31,8 @@ Supported options:
> >>  @item preallocation
> >>  Preallocation mode (allowed values: @code{off}, @code{falloc}, 
> >> @code{full}).
> >>  @code{falloc} mode preallocates space for image by calling 
> >> posix_fallocate().
> >> -@code{full} mode preallocates space for image by writing zeros to 
> >> underlying
> >> -storage.
> >> +@code{full} mode preallocates space for image by writing data to 
> >> underlying
> >> +storage.  This data may or may not be zero, depending on the storage 
> >> location.
> >>  @end table
> >>  
> >>  @item qcow2
> >> diff --git a/qemu-img.texi b/qemu-img.texi
> >> index c8e9bba515..b5156d6316 100644
> >> --- a/qemu-img.texi
> >> +++ b/qemu-img.texi
> >> @@ -666,8 +666,8 @@ Supported options:
> >>  @item preallocation
> >>  Preallocation mode (allowed values: @code{off}, @code{falloc}, 
> >> @code{full}).
> >>  @code{falloc} mode preallocates space for image by calling 
> >> posix_fallocate().
> >> -@code{full} mode preallocates space for image by writing zeros to 
> >> underlying
> >> -storage.
> >> +@code{full} mode preallocates space for image by writing data to 
> >> underlying
> >> +storage.  This data may or may not be zero, depending on the storage 
> >> location.
> >>  @end table
> >>  
> >>  @item qcow2
> > 
> > Just a question:
> 
> But a very good one, actually.

:-)

> 
> > if a protocol driver returns 1 with the .bdrv_has_zero_init callback, is it
> > expected that the preallocation will fill the image with zeroes?
> 
> Yes.
> 
> > IIUC, for example, the qcow2 returns 1 with the .bdrv_has_zero_init. but 
> > during
> > the qcow2_co_truncate() it calls bdrv_co_truncate() and, depending on the
> > backend driver, it should fill the image with zeroes (or not?).
> 
> Yes.
> 
> > Maybe I miss something related to the metadata...
> 
> Well.  If the image isn’t preallocated, all will be well because nothing
> of the added range is entered into the metadata, so it returns zero when
> read (unless there is a backing file, but that is handled independently
> of has_zero_init).

Okay, that makes sense.

> 
> But you were asking about preallocation.  As I wrote in the commit
> message, the qcow2 driver lets the protocol driver preallocate the data
> and then enters it as normal data clusters into its metadata.  If the
> image is encrypted, it will appears as random data (or if the protocol
> dirver writes non-zero data).  But then it shouldn’t report
> has_zero_init as 1.
> 
> Let’s test it.
> 
> $ qemu-img create -f qcow2 src.qcow2 64M
> Formatting 'src.qcow2', fmt=qcow2 size=67108864 cluster_size=65536
> lazy_refcounts=off refcount_bits=16
> $ qemu-img convert -O qcow2 \
>     -o \
>     encrypt.format=luks,encrypt.key-secret=pass,preallocation=metadata \
>     --object secret,id=pass,data=123456 \
>     src.qcow2 dest.qcow2
> $ qemu-img compare --image-opts \
>     file.filename=src.qcow2 \
>     file.filename=dest.qcow2,encrypt.key-secret=pass \
>     --object secret,id=pass,data=123456
> Content mismatch at offset 0!
> 
> Oops.
> 
> 
> So.  We can do two things here.
> 
> (A) We drop this patch and actually make sure that preallocation always
> writes zeroes, and if that cannot be done efficiently, then too bad.
> 
> Note that for qcow2, we cannot just mark all clusters as zero clusters
> because that would kind of defeat the purpose of metadata preallocation.
>  One of its main purposes is to prevent COW when you write to a new
> image, i.e. that the qcow2 driver needs to do a read-modify-write cycle
> just to zero a new cluster.  If we kept preallocating potentially random
> data and hooked it up as preallocated zero clusters, those RMW cycles
> would remain, thus defeating the point of metadata preallocation.  So
> even for qcow2, if there is a chance that the data stored is not zero,
> it needs to explicitly store zeroes then.
> 
> But the good thing here is that the protocol driver would always
> guarantee that it preallocates pure zeroes.
> 
> The bad thing is that I don’t think we could support pure metadata or
> falloc preallocation together with encryption any longer, which would
> definitely be an incompatible change.  Well, because we wouldn’t want to
> break this support, I suppose we would in this case (encryption) resort
> to linking the data clusters as preallocated zero clusters.  Which is
> bad because of RMW. but well.  That’s life.

Yes, I agree that it is a bad thing.

> 
> 
> (B) We keep this patch and audit our use of bdrv_has_zero_init().  So
> for qcow2, we need to return 0 for encrypted images.  That is suboptimal
> for non-preallocated encrypted images, but again, that’s life.
> 
> Also, we need to return 0 if the protocol layer does not preallocate the
> data to be zero -- which we can see from its setting of
> bdrv_has_zero_init(), I’d suppose.  So for preallocated unencrypted
> images, qcow2 would need to return bdrv_has_zero_init(s->data_file->bs).

Yes, that's what I had in mind, too.

> 
> But here’s the problem: How do we know whether an image has been
> preallocated or not?  And that is a problem.  I don’t know a solution
> off the top of my head.  We could add a parameter to
> bdrv_has_zero_init() for that, but what would blockdev-mirror assume?
> It doesn’t know.

Why we need to know if an image has been preallocated with zeroes or not?

Maybe when we convert it, but I'm not sure.

> 
> Hm.  I suppose qcow2’s implementation could sift through its L2 tables
> and if there are any links to data clusters, it is by definition in some
> way preallocated.
> 
> 
> So I suppose B seems like the better solution if there is a way for all
> format drivers to determine whether their image has been preallocated or
> not?

Seems like the better solution to me, also if I miss something why we need to
determine whether or not it's been pre-allocated. :-(


Thanks for very detailed answer,
Stefano



reply via email to

[Prev in Thread] Current Thread [Next in Thread]