qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-block] [PATCH] doc: Preallocation does not requir


From: Max Reitz
Subject: Re: [Qemu-devel] [Qemu-block] [PATCH] doc: Preallocation does not require writing zeroes
Date: Fri, 12 Jul 2019 20:33:14 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2

On 12.07.19 12:27, Stefano Garzarella wrote:
> On Thu, Jul 11, 2019 at 03:29:35PM +0200, Max Reitz wrote:
>> When preallocating an encrypted qcow2 image, it just lets the protocol
>> driver write data and then does not mark the clusters as zero.
>> Therefore, reading this image will yield effectively random data.
>>
>> As such, we have not fulfilled the promise of always writing zeroes when
>> preallocating an image in a while.  It seems that nobody has really
>> cared, so change the documentation to conform to qemu's actual behavior.
>>
>> Signed-off-by: Max Reitz <address@hidden>
>> ---
>>  qapi/block-core.json         | 9 +++++----
>>  docs/qemu-block-drivers.texi | 4 ++--
>>  qemu-img.texi                | 4 ++--
>>  3 files changed, 9 insertions(+), 8 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 0d43d4f37c..a4363b84d2 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -5167,10 +5167,11 @@
>>  # @off: no preallocation
>>  # @metadata: preallocate only for metadata
>>  # @falloc: like @full preallocation but allocate disk space by
>> -#          posix_fallocate() rather than writing zeros.
>> -# @full: preallocate all data by writing zeros to device to ensure disk
>> -#        space is really available. @full preallocation also sets up
>> -#        metadata correctly.
>> +#          posix_fallocate() rather than writing data.
>> +# @full: preallocate all data by writing it to the device to ensure
>> +#        disk space is really available. This data may or may not be
>> +#        zero, depending on the image format and storage.
>> +#        @full preallocation also sets up metadata correctly.
>>  #
>>  # Since: 2.2
>>  ##
>> diff --git a/docs/qemu-block-drivers.texi b/docs/qemu-block-drivers.texi
>> index 91ab0eceae..c02547e28c 100644
>> --- a/docs/qemu-block-drivers.texi
>> +++ b/docs/qemu-block-drivers.texi
>> @@ -31,8 +31,8 @@ Supported options:
>>  @item preallocation
>>  Preallocation mode (allowed values: @code{off}, @code{falloc}, @code{full}).
>>  @code{falloc} mode preallocates space for image by calling 
>> posix_fallocate().
>> -@code{full} mode preallocates space for image by writing zeros to underlying
>> -storage.
>> +@code{full} mode preallocates space for image by writing data to underlying
>> +storage.  This data may or may not be zero, depending on the storage 
>> location.
>>  @end table
>>  
>>  @item qcow2
>> diff --git a/qemu-img.texi b/qemu-img.texi
>> index c8e9bba515..b5156d6316 100644
>> --- a/qemu-img.texi
>> +++ b/qemu-img.texi
>> @@ -666,8 +666,8 @@ Supported options:
>>  @item preallocation
>>  Preallocation mode (allowed values: @code{off}, @code{falloc}, @code{full}).
>>  @code{falloc} mode preallocates space for image by calling 
>> posix_fallocate().
>> -@code{full} mode preallocates space for image by writing zeros to underlying
>> -storage.
>> +@code{full} mode preallocates space for image by writing data to underlying
>> +storage.  This data may or may not be zero, depending on the storage 
>> location.
>>  @end table
>>  
>>  @item qcow2
> 
> Just a question:

But a very good one, actually.

> if a protocol driver returns 1 with the .bdrv_has_zero_init callback, is it
> expected that the preallocation will fill the image with zeroes?

Yes.

> IIUC, for example, the qcow2 returns 1 with the .bdrv_has_zero_init. but 
> during
> the qcow2_co_truncate() it calls bdrv_co_truncate() and, depending on the
> backend driver, it should fill the image with zeroes (or not?).

Yes.

> Maybe I miss something related to the metadata...

Well.  If the image isn’t preallocated, all will be well because nothing
of the added range is entered into the metadata, so it returns zero when
read (unless there is a backing file, but that is handled independently
of has_zero_init).

But you were asking about preallocation.  As I wrote in the commit
message, the qcow2 driver lets the protocol driver preallocate the data
and then enters it as normal data clusters into its metadata.  If the
image is encrypted, it will appears as random data (or if the protocol
dirver writes non-zero data).  But then it shouldn’t report
has_zero_init as 1.

Let’s test it.

$ qemu-img create -f qcow2 src.qcow2 64M
Formatting 'src.qcow2', fmt=qcow2 size=67108864 cluster_size=65536
lazy_refcounts=off refcount_bits=16
$ qemu-img convert -O qcow2 \
    -o \
    encrypt.format=luks,encrypt.key-secret=pass,preallocation=metadata \
    --object secret,id=pass,data=123456 \
    src.qcow2 dest.qcow2
$ qemu-img compare --image-opts \
    file.filename=src.qcow2 \
    file.filename=dest.qcow2,encrypt.key-secret=pass \
    --object secret,id=pass,data=123456
Content mismatch at offset 0!

Oops.


So.  We can do two things here.

(A) We drop this patch and actually make sure that preallocation always
writes zeroes, and if that cannot be done efficiently, then too bad.

Note that for qcow2, we cannot just mark all clusters as zero clusters
because that would kind of defeat the purpose of metadata preallocation.
 One of its main purposes is to prevent COW when you write to a new
image, i.e. that the qcow2 driver needs to do a read-modify-write cycle
just to zero a new cluster.  If we kept preallocating potentially random
data and hooked it up as preallocated zero clusters, those RMW cycles
would remain, thus defeating the point of metadata preallocation.  So
even for qcow2, if there is a chance that the data stored is not zero,
it needs to explicitly store zeroes then.

But the good thing here is that the protocol driver would always
guarantee that it preallocates pure zeroes.

The bad thing is that I don’t think we could support pure metadata or
falloc preallocation together with encryption any longer, which would
definitely be an incompatible change.  Well, because we wouldn’t want to
break this support, I suppose we would in this case (encryption) resort
to linking the data clusters as preallocated zero clusters.  Which is
bad because of RMW. but well.  That’s life.


(B) We keep this patch and audit our use of bdrv_has_zero_init().  So
for qcow2, we need to return 0 for encrypted images.  That is suboptimal
for non-preallocated encrypted images, but again, that’s life.

Also, we need to return 0 if the protocol layer does not preallocate the
data to be zero -- which we can see from its setting of
bdrv_has_zero_init(), I’d suppose.  So for preallocated unencrypted
images, qcow2 would need to return bdrv_has_zero_init(s->data_file->bs).

But here’s the problem: How do we know whether an image has been
preallocated or not?  And that is a problem.  I don’t know a solution
off the top of my head.  We could add a parameter to
bdrv_has_zero_init() for that, but what would blockdev-mirror assume?
It doesn’t know.

Hm.  I suppose qcow2’s implementation could sift through its L2 tables
and if there are any links to data clusters, it is by definition in some
way preallocated.


So I suppose B seems like the better solution if there is a way for all
format drivers to determine whether their image has been preallocated or
not?

Max

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]