Re: bitmap migration bug with -drive while block mirror runs

qemu-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: bitmap migration bug with -drive while block mirror runs

From:	John Snow
Subject:	Re: bitmap migration bug with -drive while block mirror runs
Date:	Thu, 3 Oct 2019 19:34:56 -0400
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.0

On 10/3/19 6:14 AM, Vladimir Sementsov-Ogievskiy wrote:
> 03.10.2019 0:35, John Snow wrote:
>> On 10/2/19 6:46 AM, Peter Krempa wrote:
>>
>> [ * poof * ]
>>
>>>
>>> I'd like to re-iterate that the necessity to keep node names same on
>>> both sides of migration is unexpected, undocumented and in some cases
>>> impossible.
>>>
>>> If you want to mandate that they must be kept the same please document
>>> it and also note the following:
>>>
>>> - during migrations the storage layout may change e.g. a backing chain
>>>    may become flattened, thus keeping node names stable beyond the top
>>>    layer is impossible
>>>
>>
>> The struct and layout of the graph is entirely unrelated to the
>> requirement that a bitmap attached to a node with a name on the source
>> needs to have a node with the same name on the destination. It's an
>> addressability requirement only.
>>
>> Change it entirely to a new drive if you want, move it up or down the
>> graph, it doesn't matter.
>>
>> Libvirt is in the best position to understand where bitmaps already are
>> and where it wants them to go.
>>
>>> - in some cases (readonly image in a cdrom not present on destination,
>>>    thus not relevant here probably) it may even become impossible to
>>>    create any node thus keeping the top node may be impossible
>>>
>>
>> It's not mandatory to recreate the graph exactly. Consider what you are
>> saying:
>>
>> - Libvirt adds a bitmap to node "N"
>> - Libvirt asks QEMU for bitmap migration
>> - Libvirt migrates to a QEMU instance that not only does not have a node
>> "N", but has no analogous node at all!
>>
>> I believe this is right to fail as there is no way to fulfill the
>> request as-is.
>>
>> (More below if you feel it's valid to migrate only some bitmaps.)
>>
>>> - it should be documented when and why this happens and how management
>>>    tools are supposed to do it
>>>
>>
>> OK, agreed, and I am sorry that our existing story has been hand-wavey.
>>
>> Let me tell you the exact specifics of the current broken logic so you
>> can understand the requirements as they exist right now.
>>
>> 1. Bitmaps attempt to use their device name to migrate, if available.
>> This covers 99% of use cases where a bitmap was added to a node that was
>> attached directly to a device model.
>>
>> This includes almost all usual cases: bitmaps loaded from qcow2 files,
>> bitmaps added via QMP to a root node, bitmaps added via QMP to a drive name.
>>
>> (It does not include cases where bitmaps are intentionally added to
>> nodes that aren't a device root. Libvirt, I believe, can simply never do
>> this and it will never come up.)
>>
>> 2. If a device name isn't available because this bitmap is not attached
>> to a root OR the BB does not have a name, we migrate using the node name.
>>
>> 3. No attention is paid whatsoever to whether a node name is
>> automatically generated or not. In effect, if the device name lookup
>> fails we currently "trust" that the node name is something meaningful.
>>
>> 4. The bug as I originally perceived of it relates specifically to our
>> failure to resolve the device name after graph manipulations.
>>
>>
>> Under these rules, if we "fixed" #4, node-names wouldn't show up in the
>> stream at all if you never attached a bitmap to a non-root node. This is
>> probably what you expected.
>>
>> Node-names only feature in cases where we can't find a device/drive
>> name, which is:
>> A. When a bitmap is attached to a non-root node specifically. Libvirt
>> can simply never do this!
>> B. When under a graph transformation for drive-mirror; point #4 above.
>>
>>
>> The workaround for this bug if we don't find a good policy:
>>
>> 1. Use blockdev.
>> 2. Give explicit, semantic names to the root nodes that represent the drive.
>> 3. Any name used to add a bitmap must appear on the destination in a
>> migration.
>>
>>
>>> - please let me know what's actually expected, since libvirt
>>>    didn't enable blockdev yet we can fix any unexpected expectations
>>>
>>
>> I have been and will continue to be diligent in CCing you and libvirt list.
>>
>> At the moment I am still leaning towards the idea that libvirt should
>> expect that any bitmaps attached to a node with an explicit node-name
>> will want to use those names to migrate, but that we might be able to
>> limit the cases such that you will be able to avoid the circumstance
>> entirely.
>>
>> However, QEMU's actual implementation is that they are node object. QEMU
>> is ill-equipped to make semantic decisions about what the bitmaps "mean"
>> or "represent"; the name is unfortunately the most explicit identifier
>> we have to convey what bitmap we are talking about.
>>
>> It will be libvirt's job to use node names to help facilitate QEMU's
>> transfer of these objects during migration in a semantically helpful way.
>>
>>> - Document it so that the expectations don't change after this.
>>>
>>
>> OK. I will take charge on this, once we reach a consensus.
>>
>>> - Ideally node names will not be bound to anything and freely
>>>    changeable. If necessary we can provide a map to qemu during migration
>>>    which is probably less painful and more straightforward than keeping
>>>    them in sync somehow ...
>>
>> Why do you want node names to be freely manipulable?
>>
>> The only constraint we've actually added is that a root node (that has a
>> bitmap) attached to a device needs to have a name that is available on
>> the target.
>>
>> (Oh, and, that the virtual size of that target matches the source.)
>>
>>>
>>
>>
>>
>> Phew. In terms of non-direct replies to Peter's questions above, I've
>> written out like a dozen failed replies to this, so I'm still quite
>> confused but need to work on other things today.
>>
>> I currently think that:
>>
>>
>> 1. If a user uses block-dirty-bitmap-add, we have some sense of where
>> they wanted the bitmap to go in the graph because they specified a name.
>> Migration, if left as an automatic (opt-in) process, should try to
>> migrate in-kind:
>>
>> - If the user used a drive name, try to use a drive name to migrate. If
>> there is no drive name and our node name is autogenerated, we cannot
>> migrate this bitmap.
>> - If the user used an explicit, non-generated node name, use the node
>> name. If the user used an implicit node-name, we need to try to resolve
>> the device name again. If that's not possible, the bitmap cannot be
>> migrated.
>>
> 
> Personally, I don't like the idea of separating auto-generated node-names from
> user specified.. But I inderstand that there is a risk of selecting wrong node
> on target because of this..
> 

The way I see it, we know an auto-generated node name will never be
correct, but an explicitly specified one represents an explicit user
configuration.

It's wrong to use generated names for migration details, but it's never
wrong to use explicit configuration.

So I believe they are /already/ distinct in nature. We even already have
code that can tell them apart.

> Also, I'm afraid we can't rely on which name user used when created bitmap, as
> we don't want to store this information into qcow2, when we shutdown vm..
> 

Right, I was thinking about this as I wrote my long email, but believe
that it poses no real problem in the "no explicit mapping" default case.

There are four cases here:

- The bitmap is loaded to a root node with an explicit name
- The bitmap is loaded to a non-root node with an explicit name

The blockdev case with persistence. The name represents explicit user
configuration and can be relied upon in the destination.

- The bitmap is loaded to a root node with an implicit name, with a named BB

The -drive case. The named BB represents the explicit user configuration
and can be relied upon.

- The bitmap is loaded to a non-root node with an implicit name.

There is no known explicit user configuration that identifies the node
this bitmap is attached to; it cannot be migrated without an explicit
override mapping.


Why I like this approach:
- We always use explicit user configuration.
- We do not attempt to migrate bitmaps that were not explicitly
positioned by the user without further configuration.

>>
>> This implies that QEMU will try to "guess" where bitmaps go when using
>> -drive/-device, but will rely on explicit configuration when using
>> blockdev. I think the spirit of this idea is correct.
>>
>> (Vladimir: this is indeed different from EITHER of my suggested
>> resolution orders over the last two days.)
>>
>>
>>
>> 2. I like Vladimir's idea of providing a "default" migration approach,
>> but allowing libvirt to override some features of it if necessary.
>>
>> Overrides that I think will be helpful in alleviating any pain in the
>> long term:
>>
>> - Whitelists / Blacklists
>>
>> The ability to provide either a whitelist or a blacklist for bitmaps
>> that we desire to migrate. The default can continue to be: "All bitmaps
>> with a name." This will allow libvirt to drop bitmaps at its discretion
>> if it performs a block graph reconfiguration on migration and the bitmap
>> is no longer semantically relevant or appropriate for whatever reason.
>> This is superior to explicitly deleting bitmaps or dropping nodes in
>> order to have a valid recourse on failed migrations.
>>
>>
>> - The ability to override specific mappings on an as-needed basis. I
>> believe the default resolution mechanism should be one that behaves like
>> I specify above; but if that resolution is untenable for some reason,
>> you can specify a remapping if you really require.
>>
>> I am actually hoping that remapping is actually not necessary, because I
>> think it's sufficient to use node-names to explicitly direct bitmaps to
>> their intended targets.
>>
>> But if we truly do need that power, I'm open to providing an interface
>> to specify it.
>>
>>
>>
>> I hope everyone is as confused as I am, now.
>> --js
>>
> 
> I now feel myself closer to the idea that node-names are a property of running
> Qemu instance, and may change after vm restart or migration. We may add/remove
> drives which are including persistent dirty bitmaps, so node-name is not a
> persistent property of the bitmap. Note, that we store bitmap names in qcow2,
> but we never store node-name.
> 

You're right that node-names aren't an indelible property of the bitmap
itself in the same way that granularity and size are.

However, I do think that node-names are a run-time attribute  of bitmaps
in that bitmaps have an "address" -- their (node, name) pair -- that has
always served as their identifier in the QMP API.

To say that this is an internal detail is misleading, I think. It has
always been the single most visible identifier of bitmaps at runtime.

Does it make sense to think that a bitmap you added with "add
node-name=NodeX bitmap=MyBitmap" would do anything other than associate
with NodeX on the destination?

That's surprising to me.

> Therefore, I think it's OK to drop node-name default matching at all, because:
> 
> 1. We don't need it for backward compatibility, as it never worked, because of
> autogeneration of node-names.

Well, part of it worked. I always tested with explicitly named nodes on
both ends of the migration because I assumed that this was a reasonable
constraint -- I expected it to break if you named them incorrectly.
That's not too different from how migration works already: If you
misconfigure the target, it's going to go poorly.

What I never tested was a configuration that accidentally sent a
generated node-name, which is something that I am adamant we must
prevent. However, on this subject and related to your point below:

> 2. It may lead to occasional migration of the bitmap to wrong node, which may
> lead to some kind of corrupted backup. (it's always safer to drop bitmap than 
> to
> wrongly migrate it)
> 
> So, I'd leave default to migrate bitmaps in root node (may be, though some 
> filters)
> by Blk node name. And don't touch node-names.
> 

... I'm actually OK with this approach for right now, provided that we
ensure the drive-mirror case works again. This is apparently quite
complex and I'm okay with taking the time to make sure we all agree.

However, this might cause a regression for anyone relying on named node
migrations already; we should be careful there.

> ====
> 
> If we implement migration-set-bitmaps-mapping qmp command, I think it should
> override the default, so that blk names are not involved at all. And this map
> should define the whole bitmaps migration picture, without any defaults, like 
> this:
> 
> MigrationMap = [ MigrationEntry, ... ]
> MigrationEntry = { MigrationSource, MigrationTarget }
> MigrationSource = NodeName (means all bitmaps from the node) | {NodeName, 
> BitmapName}
> MigrationTarget = NodeName | Null (means remove the bitmap)
> 
> All named bitmaps must be covered by this definition, otherwise - return 
> error.
> 

I like this grammar. If we wind up wanting or needing this, this is a
good start.

> The definition is extendable, if we decide at some point to allow to change 
> bitmap name
> or to change properties (persistence, enabled) it will be possible.
> 

Sometimes I like the idea of an overlay configuration where you don't
have to specify the entire mapping, but instead just extend a default
behavior with the ability to specify the entire mapping.

Libvirt, of course, would naturally always specify the entire
configuration explicitly.

In my vision:

- QEMU migrates any bitmap it has a "concrete name" for that was
unambiguously provided directly by the user.
- For bitmaps with ambiguous locations (no named BB direct ancestor, no
explicit node-name) we cannot migrate without further config; fail.
- QEMU allows as-needed mappings/overrides on a per-bitmap basis to
change behavior as desired.

Why I like this approach:
- It actually doesn't involve any magic; it's using only explicit user
configuration.
- The defaults are arguably correct for any situation that does not
incur an error during initialization.
- It does not involve any values that were guessed without user input,
so the migration should not be dangerous. If the user gets the node
names wrong on the destination, that is explicitly their fault for
asking for the wrong thing.
- It allows a user to engage a bitmap migration with less scaffolding to
accomplish it. If it is at all possible, I prefer an API that does not
require extremely verbose configuration to operate in normative cases. I
recognize that this is not always possible.

If you are worried that some defaults might "slip through" without
getting overridden, it's always OK to add a "nodefaults" flag so that
libvirt can be solidly assured that is in complete and total control of
every last detail.

(But, if it's simpler to say that manual configuration is always "all or
nothing" for the sake of implementation, that's a good reason not to do
something fancier. I'm just stating my preference in terms of what I
like as a user.)

> ====
> 
> If it's a problem for libvirt to keep same node-names, why should we insist?
> 
> 

Is it really a problem? If libvirt requests migration of bitmaps, those
bitmaps need to go somewhere. Even if it constructs a different graph
for valid reasons, it should still understand which qcow2 nodes
correlate to which other qcow2 nodes and name them accordingly.

I don't see why this is actually a terrible constraint. Even in our
mapping proposal we're still using node-names.


So here's a summary of where I stand right now:

- I want an API in QEMU that doesn't require libvirt.

- I want to accommodate libvirt, but don't understand the requirement
that node-names must be ephemeral.

- I would like to define a set of default behaviors (when bitmap
migration is enabled) that migrates only bitmaps it is confident it can
do so correctly and error out when it cannot.

- I'd like to amend the bitmap device name resolution to accommodate the
drive-mirror case.

- Acknowledging that there might be cases where the defaults just simply
aren't powerful enough, allow a manual configuration that allows us to
select or deselect bitmaps and explicitly set their destination node-name.


Sorry to have typed so many words about this; the path forward was not
necessarily clear and we all live in quite different timezones.

--js
[Prev in Thread]
Current Thread
[Next in Thread]
Re: bitmap migration bug with -drive while block mirror runs, (continued)
Prev by Date: Re: [Qemu-devel] [PATCH v2 09/15] target/arm/arm-semi: Factor out implementation of SYS_READ
Next by Date: Re: [Qemu-devel] [PATCH v2 10/15] target/arm/arm-semi: Factor out implementation of SYS_ISTTY
Previous by thread: Re: bitmap migration bug with -drive while block mirror runs
Next by thread: Re: bitmap migration bug with -drive while block mirror runs
Index(es):
- Date
- Thread