qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v5 07/12] qemu-img: Empty images after commit


From: Max Reitz
Subject: Re: [Qemu-devel] [PATCH v5 07/12] qemu-img: Empty images after commit
Date: Tue, 22 Apr 2014 20:01:32 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0

On 22.04.2014 19:48, Eric Blake wrote:
On 04/22/2014 10:22 AM, Max Reitz wrote:
On 22.04.2014 17:19, Eric Blake wrote:
On 04/17/2014 03:59 PM, Max Reitz wrote:
After the top image has been committed into an image in its backing
chain, all images above that base image should be emptied to restore the
old qemu-img commit behavior.

Signed-off-by: Max Reitz <address@hidden>
---
   qemu-img.c | 87
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
   1 file changed, 84 insertions(+), 3 deletions(-)
Does emptying an image take significant time?  If so, does that need to
be reflected in the progress meter?
For a 16 GB image I have here (should be nearly full) it took 1:22 min.
Copying it took six minutes, so I guess committing it would take even
more. I think the ratio is small enough not to include it in the
progress meter.

Furthermore, I don't see a reasonable implementation for a make_empty
progress output: As a general implementation for all image formats
implementing discard will not work (the qcow2 implementation clearly
states that discarded sectors should read back as zero), we would need
some way of returning the progress in every format-specific make_empty
implementation. This would probably require callbacks which I don't see
feasible.
Thinking about it more, the ability to empty an image after commit
should be optional, and exposed as an option for the user to control.

I can see at least reasons to use a commit action:

1. I have 'base <- active', made changes in active, and want the current
state of active to become my new base, but still want to preserve active
for making further changes.  In this situation, I _want_ the commit
action to empty active (so that it is once again the minimal size).

2. I have 'base <- snap' long enough for me to copy base somewhere else
as a form of snapshot management, but now want to get rid of the
temporary 'snap' file by committing the chain back down to just 'base',
making 'base' once again the active file.  In this situation, I don't
want to waste ANY time modifying 'snap', so the operation should end as
soon as base is finished.

Furthermore, consider what happens when committing across more than one
image, as in 'base <- snap1 <- snap2' all the way down to 'base'.  The
moment that I copy a cluster from 'snap2' into 'base' that has not also
been changed in 'snap1', I have already invalidated the 'snap1' image.
Or more concretely, using XX for any cluster that refers to the parent:

base:  AA BB CC DD
snap1: EE XX FF XX
snap2: GG HH XX XX

The guest sees: GG HH FF DD.  Someone looking at just snap1 sees: EE BB
FF DD, which matches data that the guest saw at some point in the past.

The commit operation, prior to the empty step, changes things to:

base:  GG HH FF DD
snap1: EE XX FF XX
snap2: GG HH XX XX

Someone looking at just snap1 NOW sees: EE HH FF DD, which is not
something that the guest ever saw.  We have made an inconsistent image.

If you then empty the entire chain, we have:

base:  GG HH FF DD
snap1: XX XX XX XX
snap2: XX XX XX XX

Someone looking at snap2 still sees GG HH FF DD, which means the guest
does not notice any difference on whether we stuck with snap2 as the
active, or whether we changed over to base as the new active.  Someone
looking at snap1 NOW sees GG HH FF DD, which is a guest-consistent view,
but is NOT what the guest saw at the point snap1 was created.

In other words, I'm arguing that the ONLY time that leaving the active
image empty after completing a commit is when we are committing ONLY one
file deep - the moment multiple files are involved, we have ALREADY
invalidated the intermediate files, so wasting time on them to make them
empty is pointless.  If I wanted my entire chain to be consistent, I
would have already done it in stages (commit snap2 into snap1, then
commit that into base).

You're right. Great, then I can drop this ugly list for keeping the backing chain.

Furthermore, isn't the ability to empty an image precisely what
'qemu-img rebase' already gives us?  That is, for my two use cases
above, use case 2 never wants to empty images, and use case 1 has only a
single image to empty.  But observe what happens if we do the commit of
'base <- active' into just 'base' while NOT emptying 'active', and then
follow it up with a command to do 'qemu-img rebase -b base active'.  The
point of the rebase operation is that any cluster in 'active' that
duplicates a cluster in 'base' is discarded from 'active' - which has
the net result of making 'active' empty.  Thus, for my use case 1 above,
the option to empty an image is merely syntactic sugar for doing a
commit without empty followed by a rebase onto the same base that we
just committed into.

I took a look into rebase because I wondered how it could discard sectors in a way they don't read back as zero but fall through to the backing file (which is what a generic implementation of bdrv_make_empty() would require and which I would have missed so far, if it existed). As far as I see it, rebase does not discard sectors, it only writes differing currently unallocated sectors.

And if I have missed something and rebase actually does discard sectors, it is still a rather expensive operation, as it has to compare the whole file basically to itself, in this case. commit leaving the image empty (after specifying the appropriate option, which should in my opinion default to true, as this is the current mode of operation) is not only syntactic sugar, but faster as well.

Max

An in my opinion more reasonable approach would be to speed up the
implementation of make_empty for qcow2 - as qcow does it, we could just
create a new empty image (that is, rewrite the L1 and refcount table and
truncate the image appropriately) instead of discarding every cluster in
the existing file. This would be much faster, but the implementation
would not be as nice and easy, so I chose not to implement it. I guess
I'll have a second look, then. ;-)
We can't throw away internal snapshots (I see you already figured that
out), but indeed, recreating an empty image is faster than explicitly
emptying cluster-by-cluster if we know there is nothing else to be lost.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]