qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RFC v2 1/5] block: add bitmap-populate job


From: Peter Krempa
Subject: Re: [PATCH RFC v2 1/5] block: add bitmap-populate job
Date: Mon, 8 Jun 2020 11:38:12 +0200
User-agent: Mutt/1.13.4 (2020-02-15)

On Sat, Jun 06, 2020 at 09:55:13 +0300, Vladimir Sementsov-Ogievskiy wrote:
> 05.06.2020 13:59, Peter Krempa wrote:
> > On Fri, Jun 05, 2020 at 12:07:47 +0200, Kevin Wolf wrote:
> > > Am 05.06.2020 um 11:58 hat Peter Krempa geschrieben:
> > > > On Fri, Jun 05, 2020 at 11:44:07 +0200, Kevin Wolf wrote:
> > 
> > [...]
> > 
> > > > The above was actually inspired by a very recent problem I have in my
> > > > attempt to use the dirty bitmap populate job to refactor how libvirt
> > > > handles bitmaps. I've just figured out that I need to shuffle around
> > > > some stuff as I can't run the dirty-bitmap-populate job while an active
> > > > layer commit is in synchronised phase and I wanted to do the merging at
> > > > that point. That reminded me of a possible gotcha in having to sequence
> > > > the blockjobs which certainly would be more painful.
> > > 
> > > It would probably be good to have not only an iotests case that tests
> > > the low-level functionality of the block job, but also one that
> > > resembles the way libvirt would actually use it in combination with
> > > other jobs.
> > 
> 
> Hi! Sorry me missing the discussion for a long time.
> 
> About new job semantics: if you create temporary bitmaps anyway, I do think 
> that we should allow to populate into target bitmap directly, without 
> creating any internal temporary bitmaps. I suggested it when reviewing v1, 
> John argued for more transaction-like semantics to look like other jobs. 
> Still, we can support both modes if we want.
> 
> Allowing to use one target for several populating job is an interesting idea. 
> Current series does "bdrv_dirty_bitmap_set_busy(target_bitmap, true);", which 
> forbids it.. Hmm. If we just drop it, nothing prevents user just remove 
> target bitmap during the job. So, we'll need something like collective-busy 
> bitmap..
> 
> > I certainly can document the way we'll use it but that in turn depends
> > on how the job behaves.
> > 
> > With the current state of the job I plan to use it in two scenarios:
> > 
> > Preface: I'm currently changing libvirt to use one active bitmap per
> > checkpoint (checkpoint is name for the point in time we want to take
> > backup from). This means that a layer of the backing chain can have
> > multiple active bitmaps depending on how many checkpoints were created
> > in the current top layer. (previously we've tried to optimize things by
> > having just one bitmap active, but the semantics were getting too crazy
> > to be maintainable long-term)
> 
> Hmm. I had a plan of creating "lazy" disabled bitmaps, to optimize scenario 
> with one active bitmap, so that disabled bitmaps are not loaded into RAM on 
> start, but only on demand. But how to do it with "many active bitmaps" 
> scenario? I don't think that's a good idea.. Possibly, we can implement 
> laziness by internally make only one active bitmap and merge it here and 
> there when you request some active bitmap which we actually didn't load yet..
> 
> Could you describe, what is the exact problem with "several disabled - one 
> active" scheme, and how is it solved by "several active"?

The 'several disabled one active' semantics _heavily_ depend on metadata
which must be tracked outside of qemu and is more prone to break. If any
of the intermediate bitmaps is broken or missing everything breaks.

Then there's the complexity of the code which handles merging of the
bitmaps during block jobs. Jobs such as blockdev-mirror in full mode and
block-commit squash together the data and we need to do something about
the bitmaps for the backups to work properly afterwards.

Without considering overlays which were created without propagating
bitmaps, the code was already getting hairy especially in the case of
backups where we needed to stitch together bitmaps for all the bitmaps
corresponding to the given point in time where the backup is taken from.

When we add overlays without any bitmaps into the mix the code for
resolving which bitmaps to merge the code is becoming very unpleasant,
hard to understand and maintain and that is the main point for the
switch.

I don't want to add unnecessary complexity to the libvirt code which
will make it more fragile or hard to understand and fix in the future.

Both points which I heard for now (performance, and backup granularity
in case of non-default qcow2 block size) don't seem compelling enough to
me to make my life of implementing the feature in libvirt so much
harder.

Also users really can just remove the point in time they wish to backup
from after a successful backup which will also remove the corresponding
active bitmap.

> > Bitmaps are no longer propagated over to upper layers when creating
> > snapshots as we can use block-dirty-bitmap-populate instead.
> 
> Unexpected turn. When all this topic only started, it was reasoned more like 
> "if user forget to create bitmap at start, let's help him".. But now it 
> becomes the common scenario. Hmm.

It's not only a "user forgot" thing, but more that a systemic change
would be required.

Additionally until _very_ recently it wasn't possible to create bitmaps
using qemu-img, which made it impossible to create overlays for inactive
VMs. Arguably this has changed so we could require it. It still adds a
moving part which can break if the user doesn't add the bitmap or
requires yet another special case handling if we want to compensate for
that.

As of such, in libvirt's tech-preview implementation that is present
currently upstream, if you create a qcow2 overlay without adding the
appropriate bitmaps, the backups simply won't work.

> What do you think of granularity? We in Virtuozzo use 1M cluster as a default 
> for qcow2 images. But we use 64k granularity (default) for bitmaps, to have 
> smaller incremental backups. So, this is advantage of creating bitmap over 
> relaying on block-status capturing by block-dirty-bitmap-populate: you don't 
> control dirtiness granularity. So, I think that bitmap propagation, or just 
> creating new dirty bitmap to track dirtiness from start of new snapshot is 
> better.

This is a valid argument. Backups in this scenario will be bigger. I
still don't feel like the code needs to be made much more complex
because of it though.

> > 1) backup
> > 
> > Prior to doing the backup I'm figuring out the final backup bitmap, this
> > involves creating a temporary bitmap populated by the job for every
> > layer of the backing chain above of the one which contains the bitmap we
> > want to take a backup from and then merge all of them together as a base
> > for the backup.
> 
> (just thinking out loud)
> 
> So, assume the sequence top -> middle -> base
> 
> If we have a backup, which was done when we were in base, than bitmap is 
> stored in base. And  is loaded, and is active, but don't changes really, as 
> base is opened read-only.]
> We merge block-status information of top and middle together with this 
> bitmap, and aggregate difference between last backup and current state.
> 
> > 
> > 2) blockjobs
> > 
> > Note: This is currently an outline how the things should be as I've hit
> > the snag with attempting to run the population jobs during 'ready' state
> > of a active-layer block-commit/blockdev-mirror job only an hour ago and
> > I need to change a few things.
> > 
> > 2.1) active layer block-commit/blockdev-mirror
> > 
> > When the job reaches 'ready' state I'll create bitmaps in the
> > destination/base image of the job for every bitmap in the images
> > dropped/merged (we use blockdev-mirror in full-sync mode) by the
> > blockjob. This will capture the writes that happen after 'job-complete'.
> > 
> > The job will then be completed and the 2.2. will be executed as well.
> 
> So, the aim is not to miss any new writes after switching to new bs, but do 
> not capture into bitmaps writes which are copying the whole disk during 
> mirror.
> 
> > 
> > 2.2) non-active commit and also continuation of active layer 
> > block-commit/blockdev-mirror
> > 
> > After the job is completed succesfully I'll create temporary
> > non-persistent bitmaps for/in the images removed by the blockjob and
> > merge them into the destination image's bitmaps depending on their
> > original location in the backing chain so that the bitmap state still
> > properly describes which blocks have changed.
> 
> I don't follow. How do you populate these new temporary bitmaps? They are 
> empty after creation..

With the 'block-dirty-bitmap-populate' block job.

> > After that the original images willbe blockdev-del-eted. The above is
> > partialy in use today and since the job is already completed also
> > requires blockdev-reopen to successfuly write to the bitmaps.
> > 
> > ----
> > 
> > While writing the above down I've actually realized that controling the
> > destination of the bitmap might not be as useful as I've thought
> > originally as in 2.2. step I might need the allocation bitmap merged
> > into multiple bitmaps, so I'd either need a temporary bitmap anyways or
> > would have to re-run the job multiple times which seems wasteful. I'm no
> > longer fully persuaded that adding the 'merge' step to the dirty
> > populate blockjob will be the best thing since sliced bread.
> > 
> 
> What is 'merge' step?

In some previous replies to Kevin, we discussed that it might be worth
optimizing 'block-dirty-bitmap-populate' to directly populate the bits
in the target bitmap rather than after the job is complete, so
efectively directly mering it. I probably described it wrong here.

>Do you mean that populating directly into target bitmap is not really needed?

I need the bitmap populated by 'block-dirty-bitmap-populate' to be
merged into multiple bitmaps in the new semantics. If the job itself
doesn't support that sematics, changing it to just to directly populate
one bitmap doesn't seem to be worth it since I'll be using intermediate
bitmaps anyways.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]