Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through s

qemu-block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through s

From:	Vladimir Sementsov-Ogievskiy
Subject:	Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage
Date:	Fri, 30 Mar 2018 16:31:24 +0300
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0

29.03.2018 18:09, Vladimir Sementsov-Ogievskiy wrote:

29.03.2018 17:03, Max Reitz wrote:

On 2018-03-29 10:08, Vladimir Sementsov-Ogievskiy wrote:

28.03.2018 17:53, Max Reitz wrote:

On 2018-03-27 12:11, Vladimir Sementsov-Ogievskiy wrote:

[...]

isn't it because a lot of cat processes? will check, update loop to
i=0; while check -qcow2 169; do ((i++)); echo $i OK; killall -9 cat;
done

Hmm... I know I tried to kill all of the cats, but for some reasonthat

didn't really help yesterday.  Seems to help now, for 2.12.0-rc0 at
least (that is, before this series).

reproduced with killing... (without these series, just on master)

After the whole series, I still get a lot of failures in 169
(mismatching bitmap hash, mostly).

And interestingly, if I add an abort():

diff --git a/block/qcow2.c b/block/qcow2.c
index 486f3e83b7..9204c1c0ac 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1481,6 +1481,7 @@ static int coroutine_fn
qcow2_do_open(BlockDriverState *bs, QDict *options,     }

       if (bdrv_dirty_bitmap_next(bs, NULL)) {
+        abort();
           /* It's some kind of reopen with already existing dirty
bitmaps. There
            * are no known cases where we need loading bitmaps in such
situation,
            * so it's safer don't load them.

Then this fires for a couple of test cases of 169 even without thethird

patch of this series.

I guess bdrv_dirty_bitmap_next() reacts to some bitmaps that migration
adds or something?  Then this would be the wrong condition, because I
guess we still want to load the bitmaps that are in the qcow2 file.

I'm not sure whether bdrv_has_readonly_bitmaps() is the correct
condition then, either, though.  Maybe let's take a step back: We want

to load all the bitmaps from the file exactly once, and that iswhen it

is opened the first time.  Or that's what I would have thought...  Is
that even correct?

Why do we load the bitmaps when the device is inactive anyway?
Shouldn't we load them only once the device is activated?

Hmm, not sure. May be, we don't need. But we anyway need to load them,

when opening read-only, and we should correspondingly reopen in thiscase.

Yeah, well, yeah, but the current state seems just wrong. Apparently
there are cases where a qcow2 node may have bitmaps before we try to
load them from the file, so the current condition doesn't work.

Furthermore, it seems like the current "state machine" is too complex so
we don't know which cases are possible anymore and what to do when.

So the first thing we could do is add a field to the BDRVQCow2State that
tells us whether the bitmaps have been loaded already or not. If not,
we invoke qcow2_load_dirty_bitmaps() and set the value to true. If the
value was true already and the BDS is active and R/W now, we call
qcow2_reopen_bitmaps_rw_hint().  That should solve one problem.


good idea, will do.


The other problem of course is the question whether we should call
qcow2_load_dirty_bitmaps() at all while the drive is still inactive.
You know the migration model better than me, so I'm asking this question
to you.  We can phrase it differently: Do we need to load the bitmaps
before the drive is activated?

Now I think that we don't need. At least, we don't have such cases inVirtuozzo (I hope :).


Why did I doubt:

1. We have cases, when we want to start vm as inactive, to be able toexport it's drive as NBD export, push some data to it and then startthe VM (which involves activating)2. We have cases, when we want to start vm stopped and operate ondirty bitmaps.

If just open all images in inactive mode until vm start, it looks likewe need bitmaps in inactive mode (for 2.). But it looks like wrongapproach anyway.Firstly, I tried to solve (1.) by simply inactivate_all() in case ofstart vm in paused mode, but it breaks at least (2.), so finally, Isolved (1.) by an approach similar with "-incoming defer". So, we haveinactive mode in two cases:

 - incoming migration
 - push data to vm before start

and, in these cases, we don't need to load dirty-bitmaps.

Also, inconsistency: now, we remove persistent bitmaps on inactivate.So, it is inconsistent to load the in inactive mode.


Ok, I'll try to respin.


finally, what cases we actually have for qcow2_do_open?

1. INACTIVE -> ACTIVE (through invalidate_cache, we obviously shouldload bitmaps, if we decided that we have no persistent bitmaps inINACTIVE mode)2. creating new bdrv state (first open of the image) in INACTIVE mode(will not load bitmaps)3. creating new bdrv state (first open of the image) in ACTIVE mode(will load bitmaps, maybe read-only if disk is RO)

If only these three cases, it would be enough to just load bitmaps if!INACTIVE and do nothing otherwise.


Or, we have some of the following cases too?

1?. ACTIVE -> ACTIVE (through invalidate_cache, some kind of no-op, weshould not reload bitmaps)2?. RO -> RW (we should reopen_bitmaps_rw) (or it is possible onlythrough bdrv_reopen, which will not touch qcow2_do_open?

3?. RW -> RO (reopen_bitmaps_ro ?)
? something other??

Max


about 169, how often is it reproducible for you?


it becomes very interseting.

persistent-migbitmap-online case failed on line 136. with mismathcedbitmap sha. This check is not relate to migration, on this line, bitmapis already successfully migrated. But for some reason it is corruptedafter stop/start the VM. How is it possible - I can't imagine. But itlooks not really related to migration.. May it relate to case, whenpostcopy was not finished or something like this?.. maybe fixingwait(RESUME) to something more appropriate will help. But it is strangeanyway.

persistent-notmigbitmap-offline case failed on ine 128, which is"self.vm_b.event_wait("RESUME", timeout=10.0)" with timeout.. can weskip RESUME for some reason? maybe just move to MIGRATION event willhelp, will check


--
Best regards,
Vladimir

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-block] [PATCH v4 1/3] qcow2-bitmap: add qcow2_reopen_bitmaps_rw_hint(), (continued)
- [Qemu-block] [PATCH v4 1/3] qcow2-bitmap: add qcow2_reopen_bitmaps_rw_hint(), Vladimir Sementsov-Ogievskiy, 2018/03/20
- Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage, Max Reitz, 2018/03/21
- Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage, Max Reitz, 2018/03/26
  - Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage, Vladimir Sementsov-Ogievskiy, 2018/03/27
    - Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage, Vladimir Sementsov-Ogievskiy, 2018/03/27
    - Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage, Vladimir Sementsov-Ogievskiy, 2018/03/27
    - Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage, Max Reitz, 2018/03/28
    - Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage, Vladimir Sementsov-Ogievskiy, 2018/03/29
    - Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage, Max Reitz, 2018/03/29
    - Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage, Vladimir Sementsov-Ogievskiy, 2018/03/29
    - Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage, Vladimir Sementsov-Ogievskiy <=
    - Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage, Vladimir Sementsov-Ogievskiy, 2018/03/30

Prev by Date: [Qemu-block] [PATCH for-2.12] nbd: trace meta context negotiation
Next by Date: [Qemu-block] [PATCH for 2.12] iotests: fix 208 for luks format
Previous by thread: Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage
Next by thread: Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage
Index(es):
- Date
- Thread