qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support


From: Kevin Wolf
Subject: Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
Date: Fri, 08 Jun 2012 19:57:33 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1

Am 08.06.2012 19:46, schrieb Jeff Cody:
> On 06/08/2012 12:11 PM, Kevin Wolf wrote:
>> Am 08.06.2012 16:32, schrieb Jeff Cody:
>>> On 06/08/2012 09:53 AM, Stefan Hajnoczi wrote:
>>>> On Fri, Jun 8, 2012 at 2:19 PM, Jeff Cody <address@hidden> wrote:
>>>>> On 06/08/2012 08:42 AM, Stefan Hajnoczi wrote:
>>>>>> Note that block-commit cannot work on the top-most image since the
>>>>>> guest is still writing to that image and we might never be able to
>>>>>> copy all the data into the base image (the guest could write new data
>>>>>> as quickly as we copy it to the base).  The command should check for
>>>>>> this and reject the top-most image.
>>>>>
>>>>> By this you mean that you would like to disallow committing the
>>>>> top-level image to the base?  Perhaps there is a way to attempt to
>>>>> converge, and adaptively give more time to the co-routine if we are able
>>>>> to detect divergence.  This may require violating the 'speed' parameter,
>>>>> however, and make the commit less 'live'.
>>>>
>>>> Yes, I think we should disallow merging down the topmost image.  The
>>>> convergence problem is the same issue that live migration has.  It's a
>>>> hard problem and not something that is essential for block-commit.  It
>>>> would add complexity into the block-commit implementation - the only
>>>> benefit would be that you can merge down the last COW snapshot.  We
>>>> already have an implementation that does this: the "commit" command
>>>> which stops the guest :).
>>>
>>> OK.  And, it is better to get this implemented now, and if desired, I
>>> suppose we can always revisit the convergence at a later date (or not at
>>> all, if there is no pressing case for it).
>>
>> Not allowing live commit for the topmost image may be a good way to
>> break the problem down into more manageable pieces, but eventually the
>> feature isn't complete until you can do it.
>>
>> Basically what you need there is the equivalent of an active mirror. I
>> wouldn't be surprised if actually code could be reused for both.
> 
> I agree, doing it like mirroring for new writes on the top layer makes
> sense, as long as you are willing to violate the (optional) speed
> parameter.  I wouldn't think violating the speed parameter in that case
> would be an issue, as long as it is a documented affect of performing a
> live commit on the active (top) layer.

The question is what the speed parameter really means for a live commit
block job (or for an active mirror). I think it would make sense to
limit only the speed of I/O that doesn't come from the guest but is from
the background copying.

If you did apply it to guest I/O as well, then I think the logical
consequence would be that guest I/O is throttled as well because
requests only complete when they have reached both source and target.

(I know I'm happily mixing terminology of all kinds of live block jobs
here, hope you understand anyway what I mean ;-))

>>> I am happy to do it that way.  I'll shift my focus to the atomic image
>>> reopen in r/w mode.  I'll go ahead and post my diagrams and other info
>>> for block-commit on the wiki, because I don't think it conflicts with we
>>> discussed above (although I will modify my diagrams to not show commit
>>> from the top-level image).  Of course, feel free to change it as
>>> necessary.
>>
>> I may have mentioned it before, but just in case: I think Supriya's
>> bdrv_reopen() patches are a good starting point. I don't know why they
>> were never completed, but I think we all agreed on the general design,
>> so it should be possible to pick that up.
>>
>> Though if you have already started with your own work on it, Jeff, I
>> expect that it won't be much different because it's basically the same
>> transactional approach that you know and that we already used for group
>> snapshots.
> 
> I will definitely use parts of Supriya's as it makes sense - what I
> started work on is similar to bdrv_append() and the current transaction
> approach, so there will be plenty in common to reuse, even with some
> differences.

I'm not sure how far a bdrv_append-like approach will work in this case.
The problem is that you must not open any image r/w twice. The policy is
that you can open an image either once r/w or multiple times read-only.

Raw images are probably forgiving if you happen to open them twice r/w,
but more complex image formats like qcow2 or qed are likely to get
corrupted if you do it.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]