qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH v15 08/25] block: introduce auto-loading bitmaps


From: Kevin Wolf
Subject: Re: [Qemu-block] [PATCH v15 08/25] block: introduce auto-loading bitmaps
Date: Fri, 17 Feb 2017 14:34:30 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben:
> On 02/17/2017 03:48 PM, Kevin Wolf wrote:
> > Am 17.02.2017 um 13:40 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >> 17.02.2017 15:09, Kevin Wolf wrote:
> >>> Am 17.02.2017 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >>>> 16.02.2017 14:49, Kevin Wolf wrote:
> >>>>> Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben:
> >>>>>> Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >>>>>>> Auto loading bitmaps are bitmaps stored in the disk image, which 
> >>>>>>> should
> >>>>>>> be loaded when the image is opened and become BdrvDirtyBitmaps for the
> >>>>>>> corresponding drive.
> >>>>>>>
> >>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
> >>>>>>> Reviewed-by: John Snow <address@hidden>
> >>>>>>> Reviewed-by: Max Reitz <address@hidden>
> >>>>>> Why do we need a new BlockDriver callback and special code for it in
> >>>>>> bdrv_open_common()? The callback is only ever called immediately after
> >>>>>> .bdrv_open/.bdrv_file_open, so can't the drivers just do this 
> >>>>>> internally
> >>>>>> in their .bdrv_open implementation? Even more so because qcow2 is the
> >>>>>> only driver that supports this callback.
> >>>>> Actually, don't we have to call this in qcow2_invalidate_cache()?
> >>>>> Currently, I think, after a migration, the autoload bitmaps aren't
> >>>>> loaded.
> >>>>>
> >>>>> By moving the qcow2_load_autoloading_dirty_bitmaps() call to
> >>>>> qcow2_open(), this would be fixed.
> >>>>>
> >>>>> Kevin
> >>>> Bitmap should not be reloaded on any intermediate qcow2-open's,
> >>>> reopens, etc. It should be loaded once, on bdrv_open, to not create
> >>>> extra collisions (between in-memory bitmap and it's stored version).
> >>>> That was the idea.
> >>>>
> >>>> For bitmaps migration there are separate series, we shouldn't load
> >>>> bitmap from file on migration, as it's version in the file is
> >>>> outdated.
> >>> That's not what your series is doing, though. It loads the bitmaps when
> >> Actually, they will not be loaded as they will have IN_USE flag.
> >>
> >>> migration starts and doesn't reload then when migration completes, even
> >>> though they are stale. Migration with shared storage would just work
> >>> without an extra series if you did these things in the correct places.
> >>>
> >>> As a reminder, this is how migration with shared storage works (or
> >>> should work with your series):
> >>>
> >>> 1. Start destination qemu instance. This calls bdrv_open() with
> >>>    BDRV_O_INACTIVE. We can read in some metadata, though we don't need
> >>>    much more than the image size at this point. Writing to the image is
> >>>    still impossible.
> >>>
> >>> 2. Start migration on the source, while the VM is still writing to the
> >>>    image, rendering the cached metadata from step 1 stale.
> >>>
> >>> 3. Migration completes:
> >>>
> >>>     a. Stop the VM
> >>>
> >>>     b. Inactivate all images in the source qemu. This is where all
> >>>        metadata needs to be written back to the image file, including
> >>>        bitmaps. No writes to the image are possible after this point
> >>>        because BDRV_O_INACTIVE is set.
> >>>
> >>>     c. Invalidate the caches in the destination qemu, i.e. reload
> >>>        everything from the file that could have changed since step 1,
> >>>        including bitmaps. BDRV_O_INACTIVE is cleared, making the image
> >>>        ready for writes.
> >>>
> >>>     d. Resume the VM on the destination
> >>>
> >>> 4. Exit the source qemu process, which involves bdrv_close(). Note that
> >>>    at this point, no writing to the image file is possible any more,
> >>>    it's the destination qemu process that own the image file now.
> >>>
> >>> Your series loads and stores bitmaps in steps 1 and 4. This means that
> >> Actually - not. in 1 bitmaps are "in use", in 4 INACTIVE is set (and
> >> it is checked), nothing is stored.
> >>
> >>> they are stale on the destination when migration completes, and that
> >>> bdrv_close() wants to write to an image file that it doesn't own any
> >>> more, which will cause an assertion failure. If you instead move things
> >>> to steps 3b and 3c, it will just work.
> >> Hmm, I understand the idea.. But this will interfere with postcopy
> >> bitmap migration. So if we really need this, there should be some
> >> additional control flags or capabilities.. The problem of your
> >> approach is that bitmap actually migrated in the short state when
> >> source and destination are stopped, it may take time, as bitmaps may
> >> be large.
> > You can always add optimisations, but this is the basic lifecycle
> > process of block devices in qemu, so it would be good to adhere to it.
> > So far there are no other pieces of information that are ignored in
> > bdrv_invalidate()/bdrv_inactivate() and instead only handled in
> > bdrv_open()/bdrv_close(). It's a matter of consistency, too.
> >
> > And not having to add special cases for specific features in the generic
> > bdrv_open()/close() paths is a big plus for me anyway.
> >
> > Kevin
> But for sure this is bad from the downtime point of view.
> On migrate you will have to write to the image and re-read
> it again on the target. This would be very slow. This will
> not help for the migration with non-shared disk too.
> 
> That is why we have specifically worked in a migration,
> which for a good does not influence downtime at all now.
> 
> With a write we are issuing several write requests + sync.
> Our measurements shows that bdrv_drain could take around
> a second on an averagely loaded conventional system, which
> seems unacceptable addition to me.

I'm not arguing against optimising migration, I fully agree with you. I
just think that we should start with a correct if slow base version and
then add optimisation to that, instead of starting with a broken base
version and adding to that.

Look, whether you do the expensive I/O on open/close and make that a
slow operation or whether you do it on invalidate_cache/inactivate
doesn't really make a difference in term of slowness because in general
both operations are called exactly once. But it does make a difference
in terms of correctness.

Once you do the optimisation, of course, you'll skip writing those
bitmaps that you transfer using a different channel, no matter whether
you skip it in bdrv_close() or in bdrv_inactivate().

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]