[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [Qemu-devel] [PATCH for-2.12 0/4] qmp dirty bitmap API

From: Kevin Wolf
Subject: Re: [Qemu-block] [Qemu-devel] [PATCH for-2.12 0/4] qmp dirty bitmap API
Date: Mon, 11 Dec 2017 12:15:29 +0100
User-agent: Mutt/1.9.1 (2017-09-22)

Am 09.12.2017 um 01:57 hat John Snow geschrieben:
> Here's an idea of what this API might look like without revealing
> explicit merge/split primitives.
> A new bitmap property that lets us set retention:
> :: block-dirty-bitmap-set-retention bitmap=foo slices=10
> Or something similar, where the default property for all bitmaps is
> zero -- the current behavior: no copies retained.
> By setting it to a non-zero positive integer, the incremental backup
> mode will automatically save a disabled copy when possible.


Operations that create or delete user-visible objects should be
explicit, not automatic. You're trying to implement management layer
functionality in qemu here, but incomplete enough that the artifacts of
it are still visible externally. (A complete solution within qemu
wouldn't expose low-level concepts such as bitmaps on an external
interface, but you would expose something like checkpoints.)

Usually it's not a good idea to have a design where qemu implements
enough to restrict management tools to whatever use case we had in mind,
but not enough to make the management tool's life substantially easier
(by not having to care about some low-level concepts).

> "What happens if we exceed our retention?"
> (A) We push the last one out automatically, or
> (B) We fail the operation immediately.
> A is more convenient, but potentially unsafe if the management tool or
> user wasn't aware that was going to happen.
> B is more annoying, but definitely more safe as it means we cannot lose
> a bitmap accidentally.

Both mean that the management layer has not only to deal with the
deletion of bitmaps as it wants to have them, but also to keep the
retention counter somewhere and predict what qemu is going to do to the
bitmaps and whether any corrective action needs to be taken.

This is making things more complex rather than simpler.

> I would argue for B with perhaps a force-cycle=true|false that defaults
> to false to let management tools say "Yes, go ahead, remove the old one"
> with additionally some return to let us know it happened:
> {"return": {
>   "dropped-slices": [ {"bitmap0": 0}, ...]
> }}
> This would introduce some concept of bitmap slices into the mix as ID'd
> children of a bitmap. I would propose that these slices are numbered and
> monotonically increasing. "bitmap0" as an object starts with no slices,
> but every incremental backup creates slice 0, slice 1, slice 2, and so
> on. Even after we start deleting some, they stay ordered. These numbers
> then stand in for points in time.
> The counter can (must?) be reset and all slices forgotten when
> performing a full backup while providing a bitmap argument.
> "How can a user make use of the slices once they're made?"
> Let's consider something like mode=partial in contrast to
> mode=incremental, and an example where we have 6 prior slices:
> 0,1,2,3,4,5, (and, unnamed, the 'active' slice.)
> mode=partial bitmap=foo slice=4
> This would create a backup from slice 4 to the current time α. This
> includes all clusters from 4, 5, and the active bitmap.
> I don't think it is meaningful to define any end point that isn't the
> current time, so I've omitted that as a possibility.

John, what are you doing here? This adds option after option, and even
additional slice object, only complicating an easy thing more and more.
I'm not sure if that was your intention, but I feel I'm starting to
understand better how Linus's rants come about.

Let me summarise what this means for management layer:

* The management layer has to manage bitmaps. They have direct control
  over creation and deletion of bitmaps. So far so good.

* It also has to manage slices in those bitmaps objects; and these
  slices are what contains the actual bitmaps. In order to identify a
  bitmap in qemu, you need:

    a) the node name
    b) the bitmap ID, and
    c) the slice number

  The slice number is assigned by qemu and libvirt has to wait until
  qemu tells it about the slice number of a newly created slice. If
  libvirt doesn't receive the reply to the command that started the
  block job, it needs to be able to query this information from qemu,
  e.g. in query-block-jobs.

* Slices are automatically created when you start a backup job with a
  bitmap. It doesn't matter whether you even intend to do an incremental
  backup against this point in time. qemu knows better.

* In order to delete a slice that you don't need any more, you have to
  create more slices (by doing more backups), but you don't get to
  decide which one is dropped. qemu helpfully just drops the oldest one.
  It doesn't matter if you want to keep an older one so you can do an
  incremental backup for a longer timespan. Don't worry about your
  backup strategy, qemu knows better.

* Of course, just creating a new backup job doesn't mean that removing
  the old slice works, even if you give the respective option. That's
  what the 'dropped-slices' return is for. So once again wait for
  whatever qemu did and reproduce it in the data structures of the
  management tool. It's also more information that needs to be exposed
  in query-block-jobs because libvirt might miss the return value.

* Hmm... What happens if you start n backup block jobs, with n > slices?
  Sounds like a great way to introduce subtle bugs in both qemu and the
  management layer.

Do you really think working with this API would be fun for libvirt?

> "Does a partial backup create a new point in time?"
> If yes: This means that the next incremental backup must necessarily be
> based off of the last partial backup that was made. This seems a little
> inconvenient. This would mean that point in time α becomes "slice 6."

Or based off any of the previous points in time, provided that qemu
didn't helpfully decide to delete it. Can't I still create a backup
starting from slice 4 then?

Also, a more general question about incremental backup: How does it play
with snapshots? Shouldn't we expect that people sometimes use both
snapshots and backups? Can we restrict the backup job to considering
bitmaps only from a single node or should we be able to reference
bitmaps of a backing file as well?

> If no: This means that we lose the point in time when we made the
> partial and we cannot chain off of the partial backup. It does mean that
> the next incremental backup will work as normally expected, however.
> This means that point in time α cannot again be referenced by the
> management client.
> This mirrors the dynamic between "incremental" and "differential" backups.
> ..hmmm..
> You know, incremental backups are just a special case of "partial" here
> where slice is the last recorded slice... Let's look at an API like this:
> mode=<incremental|differential> bitmap=<name> [slice=N]
> Incremental: We create a new slice if the bitmap has room for one.
> Differential: We don't create a new slice. The data in the active bitmap
> α does not get cleared after the bitmap operation.
> Slice:
> If not specified, assume we want only the active slice. This is the
> current behavior in QEMU 2.11.
> If specified, we create a temporary merge between bitmaps [N..α] and use
> that for the backup operation.
> "Can we delete slices?"
> Sure.
> :: block-dirty-bitmap-slice-delete bitmap=foo slice=4
> "Can we create a slice without making a bitmap?"
> It would be easy to do, but I'm not sure I see the utility. In using it,
> it means if you don't specify the slice manually for the next backup
> that you will necessarily be getting something not usable.
> but we COULD do it, it would just be banking the changes in the active
> bitmap into a new slice.

Okay, with explicit management this is getting a little more reasonable
now. However, I don't understand what slices buy us then compared to
just separate bitmaps.

Essentially, bitmaps form a second kind of backing chain. Backup always
wants to use the combined bitmaps of some subchain. I see two easy ways
to do this: Either pass an array of bitmaps to consider to the job, or
store the "backing link" in the bitmap so that we can just specify a
"base bitmap" like we usually do with normal backing files.

The backup block job can optionally append a new bitmap to the chain
like external snapshots do for backing chains. Deleting a bitmap in the
chain is the merge operation, similar to a commit block job for backing

We know these mechanism very well because the block layer has been using
them for ages.

> > I also have another idea:
> > implement new object: point-in-time or checkpoint. The should have
> > names, and the simple add/remove API.
> > And they will be backed by dirty bitmaps. so checkpoint deletion is
> > bitmap merge (and delete one of them),
> > checkpoint creation is disabling of active-checkpoint-bitmap and
> > starting new active-checkpoint-bitmap.
> Yes, exactly! I think that's pretty similar to what I am thinking of
> with slices.
> This sounds a little safer to me in that we can examine an operation to
> see if it's sane or not.

Exposing checkpoints is a reasonable high-level API. The important part
then is that you don't expose bitmaps + slices, but only checkpoints
without bitmaps. The bitmaps are an implementation detail.

> > Then we can implement merging of several bitmaps (from one of
> > checkpoints to current moment) in
> > NBD meta-context-query handling.
> > 
> Note:
> I should say that I've had discussions with Stefan in the past over
> things like differential mode and the feeling I got from him was that he
> felt that data should be copied from QEMU precisely *once*, viewing any
> subsequent copying of the same data as redundant and wasteful.

That's a management layer decision. Apparently there are users who want
to copy from qemu multiple times, otherwise we wouldn't be talking about
slices and retention.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]