Re: [Qemu-devel] Restoring bitmaps after failed/cancelled migration

From: Vladimir Sementsov-Ogievskiy
Subject: Re: [Qemu-devel] Restoring bitmaps after failed/cancelled migration
Date: Wed, 16 May 2018 18:52:28 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0

16.05.2018 18:32, Kevin Wolf wrote:
Am 16.05.2018 um 17:10 hat Vladimir Sementsov-Ogievskiy geschrieben:
16.05.2018 15:47, Kevin Wolf wrote:
Am 14.05.2018 um 12:09 hat Vladimir Sementsov-Ogievskiy geschrieben:
14.05.2018 09:41, Fam Zheng wrote:
On Wed, 04/18 17:00, Vladimir Sementsov-Ogievskiy wrote:
Is it possible, that target will change the disk, and then we return control
to the source? In this case bitmaps will be invalid. So, should not we drop
all the bitmaps on inactivate?
Yes, dropping all live bitmaps upon inactivate sounds reasonable. If the dst
fails to start, and we want to resume VM at src, we could (optionally?) reload
the persistent bitmaps, I guess.
Reload from where? We didn't store them.
Maybe this just means that it turns out that not storing them was a bad

What was the motivation for not storing the bitmap? The additional
downtime? Is it really that bad, though? Bitmaps should be fairly small
for the usual image sizes and writing them out should be quick.
What are usual ones? A bitmap of standard granularity of 64k for 16Tb disk
is ~30mb. If we have several such bitmaps it may be significant downtime.
We could have an in-memory bitmap that tracks which parts of the
persistent bitmap are dirty so that you don't have to write out the
whole 30 MB during the migration downtime, but can already flush most of
the persistent bitmap before the VM is stopped.


Yes it looks possible. But how to control that downtime? Introduce migration state, with specific _pending function? However, it may be not necessary.

Anyway, I think we don't need to store it.

If we decided to resume source, bitmap is already in memory, why to reload it? If someone already killed source (which was in paused mode), it is inconsistent anyway and loss of dirty bitmap is not the worst possible problem.

So, finally, it looks safe enough, just to make bitmaps on source persistent again (or better, introduce another way to skip storing (may be with additional flag, so everybody will be happy), not dropping persistent flag). And, after source resume, we have one of the following situations:

1. disk was not changed during migration, so, all is ok and we have bitmaps
2. disk was changed. bitmaps are inconsistent. But not only bitmaps, the whole vm state is inconsistent with it's disk. This case is a bug in management layer and it should never happen. And possibly, we need some separate way, to catch such cases.

Best regards,

