qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Can I only commit from active image to corresponding ra


From: Max Reitz
Subject: Re: [Qemu-devel] Can I only commit from active image to corresponding range of its backing file by qemu cmd?
Date: Thu, 13 Sep 2018 20:37:05 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.0

On 13.09.18 19:05, Eric Blake wrote:
> [adding Markus, because of an interesting observation about --image-opts
> vs. JSON null - search for [1] below]
> 
> On 9/13/18 8:22 AM, Max Reitz wrote:
>> On 13.09.18 05:33, lampahome wrote:
>>> I split data to 3 chunks and save it in 3 independent backing files like
>>> below:
>>> img.000 <-- img.001 <-- img.002
>>> img.000 is the backing file of img.001 and 001 is the backing file of
>>> 002.
>>> img.000 saves the 1st chunk of data and img.001 saves the 2nd chunk of
>>> data, and img.002 saves the 3rd chunk of data.
> 
> How have you ensured that these three files are visiting different
> ranges of guest data?

He did say "independent".

> It sounds like you are trying to keep the sizes of .000, .001, and .002
> constant, but updating their respective contents.  Rather unusual, but
> not necessarily a bad idea.
> 
>>>
>>> Now I have img.003 stores cow data of 1st chunk and img.002 is the
>>> backing
>>> file of img.003.
>>> The backing chain is like this:
>>>    img.000 <-- img.001 <-- img.002 <-- img.003
>>>
>>> So that means the data of img.003 saves the same range with img.000 but
>>> different data.
>>>
>>> I know I can use *`qemu-img commit'* but it only commit the data from
>>> img.003 to img.002.
> 
> Which, if the guest range covered by .000 and .002 are originally
> distinct, makes .002 grow in size for any changes that .003 has made
> relative to .000 or .001, rather than writing to the respective backing
> file.
> 
>>>
>>> If I use *`qemu-img rebase -b img.000 img.003`*, the data of img.001 and
>>> img.002 will merge into img.003.
> 
> Which makes .000 grow in size, because you didn't limit how much of .003
> gets committed.

I probably shouldn't interpret intentions here, but he did say "img.003
stores cow data of 1st chunk".  Which to me sounded like .003 does not
have any changes relative to .001 or .002, so .000 should not grow in size.

> But maybe it's possible to use the 'offset' and 'size'
> parameters to the raw format driver to make qemu-img see only a subset
> of img.003, at which point committing just that subset is easier.

No, because raw is not marked a filter driver, so you cannot commit
through it.

(In fact, you cannot even commit through filter drivers now.)

And this is probably correct, because exactly that offset and size make
it so that the filter BDS presents different data than its child.  So it
isn't a filter.

> Hmm -
> it might work for img.000, but not so easily for img.001 or img.002,
> because we don't have a clean way to copy from one source offset to a
> different destination offset.  Last month, I proposed a patch to enhance
> 'qemu-img dd' to do that - but the argument was that 'qemu-img convert'
> should also be able to do it, with 'qemu-img dd' being a thin veneer
> over convert rather than doing everything itself, so there's still work
> to be done.
> 
>>>
>>> What I want is only commit the data in img.003 into img.000 because the
>>> data of the two image are the same range(1st chunk)
>>>
>>> Is there anyway to commit(or merge) data of active image into
>>> corresponding
>>> backing file?
>>
>> So img.000, img.001, and img.002 all contain data at completely
>> different areas, and img.003 only contains data where img.000 contains
>> data as well?
>>
>> Say like so:
>>
>> $ qemu-img create -f qcow2 img.000 3M
>> $ qemu-img create -f qcow2 -b img.000 img.001
>> $ qemu-img create -f qcow2 -b img.001 img.002
>> $ qemu-img create -f qcow2 -b img.002 img.003
> 
> Missing -F qcow2 in those last three lines (you should always specify
> the backing format in the qcow2 metadata, otherwise you are setting
> yourself up for failures because probing is unsafe)

Is it really unsafe for non-raw images?

>> $ qemu-io -c 'write -P 1 0M 1M' img.000
>> $ qemu-io -c 'write -P 2 1M 1M' img.001
>> $ qemu-io -c 'write -P 3 2M 1M' img.002
>> $ qemu-io -c 'write -P 4 0M 1M' img.003
> 
> I'd modify this example to use:
>  qemu-io -c 'write -P 4 0M 512k' -c 'write -P 4 1m 512k' \
>    -c 'write -P 4 2m 512k' img.003
> 
> so that it becomes easier to see if we are ever committing more than
> desired.

Well, I interpreted the problem in a way that .003 does not shadow any
data from .001 or .002.

>>
>> (img.000 contains 1s from 0M to 1M;
>>   img.001 contains 2s from 1M to 2M;
>>   img.002 contains 3s from 2M to 3M;
>>   img.003 contains 4s from 0M to 1M (the range of img.000))
> 
> Or, visually, with my tweak to img.003,
> 
> img.000     11----
> img.001     --22--
> img.002     ----33
> img.003     4-4-4-
> guest sees  414243
> 
> and your goal, if I'm understanding, is to do range-based commits so
> that you end up with:
> 
> img.000     41----
> img.001     --42--
> img.002     ----43
> img.003     ------
> guest sees  414243
> 
>>
>> In that case, rebase -u might be what you want, so the following should
>> work (although it can easily corrupt your data if it isn't the case[1]):
>>
>> $ qemu-img rebase -u -b img.000 img.003
>> $ qemu-img commit img.003
> 
> No, that still copies anything that img.003 has changed from .001 or
> .002 into .000, making .000 grow in size (that is, your approach changed
> img.000 to read 41-4-4-).

Well, I definitely misunderstood the issue if .003 changed anything from
.001 or .002, because I didn't read that from the description.  To me,
it sounded like .003 only changed data that's in .000.

> If you can view just a subset of img.003,
> then you CAN commit just that subset into img.000 (but not into .001 or
> .002, because we don't yet have 'qemu-img commit --target-image-opts' to
> specify the 'offset=' argument to the raw driver).  So here's what I tried:
> 
> $ qemu-io -c 'r -P 4 0 512k' -c 'r -P 1 512k 512k' -c map --image-opts
> driver=raw,size=1m,file.driver=qcow2,file.file.driver=file,file.file.filename=img.003
> 
> read 524288/524288 bytes at offset 0
> 512 KiB, 1 ops; 0.0002 sec (1.719 GiB/sec and 3521.1268 ops/sec)
> read 524288/524288 bytes at offset 524288
> 512 KiB, 1 ops; 0.0004 sec (1.218 GiB/sec and 2493.7656 ops/sec)
> 512 KiB (0x80000) bytes     allocated at offset 0 bytes (0x0)
> 512 KiB (0x80000) bytes not allocated at offset 512 KiB (0x80000)
> 
> Yep - that fancy --image-opts syntax let us use a raw wrapper around
> qcow2 to see just the first 1M of image.003.  Now:
> 
> $ qemu-img commit --image-opts -b img.000
> driver=raw,size=1m,file.driver=qcow2,file.file.driver=file,file.file.filename=img.003
> 
> qemu-img: Did not find 'img.000' in the backing chain of
> 'driver=raw,size=1m,file.driver=qcow2,file.file.driver=file,file.file.filename=img.003'
> 
> 
> Alas, since 'raw' does not have backing files on its own, qemu-img
> commit refuses to do anything (it will only commit into a known backing
> chain).  I know Max has a proposed series to make filters behave more
> sanely (so that the backing file of an original node is also seen to be
> the backing file of a filter node), but I don't know if that would
> completely help here (the fact that the raw format node is being used
> more as a filter is a bit different from normally using it as a format
> driver - maybe we want size/offset limitations to be an actual filter
> node, separate from the raw format driver?).

As I said, that isn't a filter.  A filter does not change what data is
visible, and that's very important.

Because for instance, for committing, you need to be able to go
backwards.  So you read something at offset X from the filter, and you
want to commit it down the chain -- of course, you write it to offset X
in the target backing file.  But if you use a raw node with an offset,
that changes, so we'd need to be able to translate it back.

(More generally, if you change the data that's visible, the "data
filter" node would need to provide a way to translate the data.  Well,
the way is clearly there, it's the write function it provides, but we'd
need to do some funky stuff to employ it.)

> But I'm not giving up just yet - we can use qemu-img convert to create a
> temporary file that contains only the data we want committed:
> 
> $ qemu-img convert -O qcow2 -B img.000 --image-opts
> driver=raw,size=1m,file.driver=qcow2,file.file.driver=file,file.file.filename=img.003
> img.004
> 
> achieving:
> 
> img.000     11----
> img.001     --22--
> img.002     ----33
> img.003     4-4-4-
> guest sees  414243
> img.004     4-
> 
> and now commit that:
> 
> $ qemu-img commit img.004
> 
> and double-check what img.000 now contains:
> 
> $ qemu-io -c 'r -P 4 0 512k' -c 'r -P 1 512k 512k' img.000
> read 524288/524288 bytes at offset 0
> 512 KiB, 1 ops; 0.0001 sec (2.872 GiB/sec and 5882.3529 ops/sec)
> read 524288/524288 bytes at offset 524288
> 512 KiB, 1 ops; 0.0002 sec (2.078 GiB/sec and 4255.3191 ops/sec)
> 
> so now we have achieved:
> 
> img.000     41----
> img.001     --22--
> img.002     ----33
> img.003     4-4-4-
> guest sees  414243
> img.004     --
> 
> Which is not quite our end goal - we have not yet freed the storage in
> img.003, AND img.004 is still wasting storage space. We can delete
> img.004 now, but I know of no way to force img.003 to deallocate those
> clusters.  Attempting:
> 
> [1]
> $ qemu-io -c 'discard 0 1m' --image-opts
> driver=qcow2,backing=,file.driver=file,file.filename=img.003
> warning: Use of "backing": "" is deprecated; use "backing": null instead
> discard 1048576/1048576 bytes at offset 0
> 1 MiB, 1 ops; 0.0002 sec (4.399 GiB/sec and 4504.5045 ops/sec)
> 
> doesn't work, as 'discard' causes img.003 to now make things read as
> zero rather than deferring to the backing chain,

Which is intentional because making data re-appear from the backing
chain can be a security issue, as far as I remember.

> even though I
> specifically told qemu to operate as if img.003 has no backing image

discard just says "I don't care what data appears there".  For qcow2 v3
the simplest way is to make it a zero cluster.

> (although it DOES reduce the disk space occupied by img.003, although
> not the file size - compare 'ls -l' and 'du' output before and after the
> attempt - which means the 'discard' DID end up punching a hole in the
> host file).
> 
> Also, that warning message is annoying.  We can't spell 'backing=null'> 
> because that tries to find a node named "null"; to avoid it, we'd have
> to support using --image-opts with JSON on the command line instead of
> dotted names, as in:
> 
> $ qemu-io -c 'discard 0 1m' --image-opts '{"driver":"qcow2",
> "backing":null, "file":{"driver":"file", "filename":"img.003"}}'
> 
> except THAT doesn't work yet (we haven't converted all our command line
> arguments to taking JSON yet). (end [1])

I hate json:{}, but we have it, so why not use it?

$ qemu-io -c 'discard 0 1m' \
    "json:{'driver':'qcow2','backing':null,
           'file':{'driver':'file','filename':'img.003'}}"
discard 1048576/1048576 bytes at offset 0
1 MiB, 1 ops; 0.0000 sec (10.389 GiB/sec and 10638.2979 ops/sec)

> I guess I can avoid the warning message by using multiple steps for
> temporarily having no backing file:
> 
> $ qemu-img rebase -u -b '' img.003
> $ qemu-io -c 'discard 0 1m' img.003
> discard 1048576/1048576 bytes at offset 0
> 1 MiB, 1 ops; 0.0002 sec (4.811 GiB/sec and 4926.1084 ops/sec)
> $ qemu-img rebase -u -F qcow2 -b img.002 img.003
> 
> But whether I use the one-liner with --image-opts or the multi-step with
> explicit 'rebase -u'  I've botched things, because now I have:
> 
> img.000     41----
> img.001     --22--
> img.002     ----33
> img.003     z-4-4-
> guest sees  014243
> 
> To restore things back for further playing around, do
> $ qemu-io -c 'w -P 4 0 512k' img.003
> 
> Hmm, another idea:
> $ qemu-img rebase -f qcow2 -b img.002 -F qcow2 img.003
> 
> Nope, doesn't work - it doesn't do deduplication by removing clusters in
> img.003 that are identical to the clusters in the underlying backing
> chain (img.003 still contains '4-4-4-' instead of the desired '--4-4-').
> So that sounds like yet another missing feature to add later.
> 
>>
>> (And then maybe
>> $ qemu-img rebase -u -b img.002 img.003
>> to return to the previous backing chain.)
>>
>> Max
>>
>>
>> [1] It will corrupt your data if img.001 or img.002 contain any data
>> where img.003 also contains data; because then that data of img.003 will
>> be hidden when viewed through img.001 and img.002.
> 
> Sorry - for all my experimenting, I could NOT find a reliable way to
> remove duplicated clusters out of img.003 once they were committed to
> img.000,

I'm not sure whether your experiments really concern what the reporter
needs in his exact case, but just for fun:

Basically, there is only one way to reliably make an image pass through
data from its backing files again.  Well, two, actually.  One is
qemu-img commit, which (for compatibility, mainly) makes the image empty
after the commit.  The other is just throwing the image away and
re-creating it from scratch.

So in any case, you cannot reliably do that for just a part of the image.

First, split .003 into the part we want to commit and the part we don't
want to commit.  This is a bit tricky without qemu-img dd @seek (or a
corresponding convert parameter), so we'll have to make do with
backing=null so we don't copy anything into the output from img.003's
backing chain.

Or, we would have to use backing=null, but for some reason that doesn't
work.  I'll have to investigate.

So rebase will need to do:

$ qemu-img rebase -u -b '' img.003

$ qemu-img convert -O qcow2 \
    "json:{'driver':'raw','offset':0,'size':1048576,\
           'file':{'driver':'qcow2',\
                   'file':{'driver':'file','filename':'img.003'}}}" \
    "json:{'driver':'null-co','size':2097152}" \
    img.003.commit.000

$ qemu-img convert -O qcow2 \
    "json:{'driver':'null-co','size':1048576}" \
    "json:{'driver':'raw','offset':1048576,'size':2097152,\
           'file':{'driver':'qcow2',\
                   'file':{'driver':'file','filename':'img.003'}}}" \
    img.003.nocommit

Now let's set the backing files.  img.003.commit.000 has only data that
goes into img.000, so that goes there, and img.003.nocommit is going to
replace our old img.003, so that goes where that was:

$ qemu-img rebase -u -b img.000 img.003.commit.000
$ qemu-img rebase -u -b img.002 img.003.nocommit

And now let's commit:

$ qemu-img commit img.003.commit.000

And let's clean up:

$ rm img.003.commit.000
$ mv img.003.nocommit img.003

Done.

(If you want to commit all three parts of img.003 into the three
different base images, you would create img.003.commit.001 and
img.003.commit.002 similarly as above, and then commit those into the
respective base images.  Then you'd just rm img.003* and you're back to
the original state.)

Max

> nor a clean way to commit data from a subset of img.003 to the
> proper img.001 or img.002.  It is possible to manually use qemu-img map
> to learn which portions of img.003 should be copied, then use qemu-nbd
> to map both img.001 and img.003 to NBD devices, and use a series of dd
> commands to copy just those portions of the guest-visible data - but
> again, while that commits to the proper backing file, it does not
> discard the clusters from img.003.  Commit with "mode":"incremental"
> could be used to direct which portions of a file to commit, if you had
> an easy way to inject a bitmap describing that portion of the file, but
> we really don't have decent offline bitmap management via qemu-img yet.
> 
> So, while this thread has sparked some ideas for future improvements,
> the takeaway message for now is no, you really can't commit just a
> portion of one qcow2 image into another.

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]