qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH for-2.12 0/4] qmp dirty bitmap API


From: John Snow
Subject: Re: [Qemu-devel] [PATCH for-2.12 0/4] qmp dirty bitmap API
Date: Mon, 11 Dec 2017 13:40:50 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0


On 12/11/2017 06:15 AM, Kevin Wolf wrote:
> Am 09.12.2017 um 01:57 hat John Snow geschrieben:
>> Here's an idea of what this API might look like without revealing
>> explicit merge/split primitives.
>>
>> A new bitmap property that lets us set retention:
>>
>> :: block-dirty-bitmap-set-retention bitmap=foo slices=10
>>
>> Or something similar, where the default property for all bitmaps is
>> zero -- the current behavior: no copies retained.
>>
>> By setting it to a non-zero positive integer, the incremental backup
>> mode will automatically save a disabled copy when possible.
> 
> -EMAGIC
> 
> Operations that create or delete user-visible objects should be
> explicit, not automatic. You're trying to implement management layer
> functionality in qemu here, but incomplete enough that the artifacts of
> it are still visible externally. (A complete solution within qemu
> wouldn't expose low-level concepts such as bitmaps on an external
> interface, but you would expose something like checkpoints.)
> 
> Usually it's not a good idea to have a design where qemu implements
> enough to restrict management tools to whatever use case we had in mind,
> but not enough to make the management tool's life substantially easier
> (by not having to care about some low-level concepts).
> 
>> "What happens if we exceed our retention?"
>>
>> (A) We push the last one out automatically, or
>> (B) We fail the operation immediately.
>>
>> A is more convenient, but potentially unsafe if the management tool or
>> user wasn't aware that was going to happen.
>> B is more annoying, but definitely more safe as it means we cannot lose
>> a bitmap accidentally.
> 
> Both mean that the management layer has not only to deal with the
> deletion of bitmaps as it wants to have them, but also to keep the
> retention counter somewhere and predict what qemu is going to do to the
> bitmaps and whether any corrective action needs to be taken.
> 
> This is making things more complex rather than simpler.
> 
>> I would argue for B with perhaps a force-cycle=true|false that defaults
>> to false to let management tools say "Yes, go ahead, remove the old one"
>> with additionally some return to let us know it happened:
>>
>> {"return": {
>>   "dropped-slices": [ {"bitmap0": 0}, ...]
>> }}
>>
>> This would introduce some concept of bitmap slices into the mix as ID'd
>> children of a bitmap. I would propose that these slices are numbered and
>> monotonically increasing. "bitmap0" as an object starts with no slices,
>> but every incremental backup creates slice 0, slice 1, slice 2, and so
>> on. Even after we start deleting some, they stay ordered. These numbers
>> then stand in for points in time.
>>
>> The counter can (must?) be reset and all slices forgotten when
>> performing a full backup while providing a bitmap argument.
>>
>> "How can a user make use of the slices once they're made?"
>>
>> Let's consider something like mode=partial in contrast to
>> mode=incremental, and an example where we have 6 prior slices:
>> 0,1,2,3,4,5, (and, unnamed, the 'active' slice.)
>>
>> mode=partial bitmap=foo slice=4
>>
>> This would create a backup from slice 4 to the current time α. This
>> includes all clusters from 4, 5, and the active bitmap.
>>
>> I don't think it is meaningful to define any end point that isn't the
>> current time, so I've omitted that as a possibility.
> 
> John, what are you doing here? This adds option after option, and even
> additional slice object, only complicating an easy thing more and more.
> I'm not sure if that was your intention, but I feel I'm starting to
> understand better how Linus's rants come about.
> 
> Let me summarise what this means for management layer:
> 
> * The management layer has to manage bitmaps. They have direct control
>   over creation and deletion of bitmaps. So far so good.
> 
> * It also has to manage slices in those bitmaps objects; and these
>   slices are what contains the actual bitmaps. In order to identify a
>   bitmap in qemu, you need:
> 
>     a) the node name
>     b) the bitmap ID, and
>     c) the slice number
> 
>   The slice number is assigned by qemu and libvirt has to wait until
>   qemu tells it about the slice number of a newly created slice. If
>   libvirt doesn't receive the reply to the command that started the
>   block job, it needs to be able to query this information from qemu,
>   e.g. in query-block-jobs.
> 
> * Slices are automatically created when you start a backup job with a
>   bitmap. It doesn't matter whether you even intend to do an incremental
>   backup against this point in time. qemu knows better.
> 
> * In order to delete a slice that you don't need any more, you have to
>   create more slices (by doing more backups), but you don't get to
>   decide which one is dropped. qemu helpfully just drops the oldest one.
>   It doesn't matter if you want to keep an older one so you can do an
>   incremental backup for a longer timespan. Don't worry about your
>   backup strategy, qemu knows better.
> 
> * Of course, just creating a new backup job doesn't mean that removing
>   the old slice works, even if you give the respective option. That's
>   what the 'dropped-slices' return is for. So once again wait for
>   whatever qemu did and reproduce it in the data structures of the
>   management tool. It's also more information that needs to be exposed
>   in query-block-jobs because libvirt might miss the return value.
> 
> * Hmm... What happens if you start n backup block jobs, with n > slices?
>   Sounds like a great way to introduce subtle bugs in both qemu and the
>   management layer.
> 
> Do you really think working with this API would be fun for libvirt?
> 
>> "Does a partial backup create a new point in time?"
>>
>> If yes: This means that the next incremental backup must necessarily be
>> based off of the last partial backup that was made. This seems a little
>> inconvenient. This would mean that point in time α becomes "slice 6."
> 
> Or based off any of the previous points in time, provided that qemu
> didn't helpfully decide to delete it. Can't I still create a backup
> starting from slice 4 then?
> 
> Also, a more general question about incremental backup: How does it play
> with snapshots? Shouldn't we expect that people sometimes use both
> snapshots and backups? Can we restrict the backup job to considering
> bitmaps only from a single node or should we be able to reference
> bitmaps of a backing file as well?
> 
>> If no: This means that we lose the point in time when we made the
>> partial and we cannot chain off of the partial backup. It does mean that
>> the next incremental backup will work as normally expected, however.
>> This means that point in time α cannot again be referenced by the
>> management client.
>>
>> This mirrors the dynamic between "incremental" and "differential" backups.
>>
>> ..hmmm..
>>
>> You know, incremental backups are just a special case of "partial" here
>> where slice is the last recorded slice... Let's look at an API like this:
>>
>> mode=<incremental|differential> bitmap=<name> [slice=N]
>>
>> Incremental: We create a new slice if the bitmap has room for one.
>> Differential: We don't create a new slice. The data in the active bitmap
>> α does not get cleared after the bitmap operation.
>>
>> Slice:
>> If not specified, assume we want only the active slice. This is the
>> current behavior in QEMU 2.11.
>> If specified, we create a temporary merge between bitmaps [N..α] and use
>> that for the backup operation.
>>
>> "Can we delete slices?"
>>
>> Sure.
>>
>> :: block-dirty-bitmap-slice-delete bitmap=foo slice=4
>>
>> "Can we create a slice without making a bitmap?"
>>
>> It would be easy to do, but I'm not sure I see the utility. In using it,
>> it means if you don't specify the slice manually for the next backup
>> that you will necessarily be getting something not usable.
>>
>> but we COULD do it, it would just be banking the changes in the active
>> bitmap into a new slice.
> 
> Okay, with explicit management this is getting a little more reasonable
> now. However, I don't understand what slices buy us then compared to
> just separate bitmaps.
> 
> Essentially, bitmaps form a second kind of backing chain. Backup always
> wants to use the combined bitmaps of some subchain. I see two easy ways
> to do this: Either pass an array of bitmaps to consider to the job, or
> store the "backing link" in the bitmap so that we can just specify a
> "base bitmap" like we usually do with normal backing files.
> 
> The backup block job can optionally append a new bitmap to the chain
> like external snapshots do for backing chains. Deleting a bitmap in the
> chain is the merge operation, similar to a commit block job for backing
> chains.
> 
> We know these mechanism very well because the block layer has been using
> them for ages.
> 
>>> I also have another idea:
>>> implement new object: point-in-time or checkpoint. The should have
>>> names, and the simple add/remove API.
>>> And they will be backed by dirty bitmaps. so checkpoint deletion is
>>> bitmap merge (and delete one of them),
>>> checkpoint creation is disabling of active-checkpoint-bitmap and
>>> starting new active-checkpoint-bitmap.
>>
>> Yes, exactly! I think that's pretty similar to what I am thinking of
>> with slices.
>>
>> This sounds a little safer to me in that we can examine an operation to
>> see if it's sane or not.
> 
> Exposing checkpoints is a reasonable high-level API. The important part
> then is that you don't expose bitmaps + slices, but only checkpoints
> without bitmaps. The bitmaps are an implementation detail.
> 
>>> Then we can implement merging of several bitmaps (from one of
>>> checkpoints to current moment) in
>>> NBD meta-context-query handling.
>>>
>> Note:
>>
>> I should say that I've had discussions with Stefan in the past over
>> things like differential mode and the feeling I got from him was that he
>> felt that data should be copied from QEMU precisely *once*, viewing any
>> subsequent copying of the same data as redundant and wasteful.
> 
> That's a management layer decision. Apparently there are users who want
> to copy from qemu multiple times, otherwise we wouldn't be talking about
> slices and retention.
> 
> Kevin
> 

Sorry.

John



reply via email to

[Prev in Thread] Current Thread [Next in Thread]