qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bitmap migration bug with -drive while block mirror runs


From: John Snow
Subject: Re: bitmap migration bug with -drive while block mirror runs
Date: Tue, 1 Oct 2019 12:23:02 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.0


On 10/1/19 11:58 AM, Kevin Wolf wrote:
> Am 01.10.2019 um 17:09 hat John Snow geschrieben:
>>
>>
>> On 10/1/19 5:54 AM, Kevin Wolf wrote:
>>> Am 01.10.2019 um 10:57 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> 01.10.2019 3:09, John Snow wrote:
>>>>> Hi folks, I identified a problem with the migration code that Red Hat QE
>>>>> found and thought you'd like to see it:
>>>>>
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
>>>>>
>>>>> Very, very briefly: drive-mirror inserts a filter node that changes what
>>>>> bdrv_get_device_or_node_name() returns, which causes a migration problem.
>>>>>
>>>>>
>>>>> Ignorant question #1: Can we multi-parent the filter node and
>>>>> source-node? It looks like at the moment both consider their only parent
>>>>> to be the block-job and don't have a link back to their parents otherwise.
>>>>>
>>>>>
>>>>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
>>>>> ultimately what we want is to be able to find the "addressable" name for
>>>>> the node the bitmap is attached to, which would be the name of the first
>>>>> ancestor node that isn't a filter. (OR, the name of the block-backend
>>>>> above that node.)
>>>>
>>>>
>>>> Better would be to migrate by node-name only.. But am I right that
>>>> node-names are different on source and destination? Or this situation
>>>> changed?
>>>
>>> Traditionally, I think migration assumes that frontends (guest devices)
>>> must match exactly, but backends may and usually will differ.
>>>
>>> Of course, dirty bitmaps are a backend feature that isn't really related
>>> to guest devices, so this doesn't really work out any more in your case.
>>> BlockBackend names are unusable for this purpose (especially as we're
>>> moving towards anonymous BlockBackends everywhere), which I guess
>>> essentially means node-name is the only option left.
>>>
>>
>> The problem as I see it involves API stability.
>>
>> We allow block-dirty-bitmap-add against e.g. "drive1" through the
>> block-backend name (the name of the "drive" as the user sees it.)
>>
>> Of course, once you start mirror, you aren't able to access that bitmap
>> through that namepair anymore -- the "address" of the bitmap has "changed"!
>>
>> (In actual fact, the bitmap always had two addresses; and simply we lost
>> an alias -- but it's the one that the user likely used to create the
>> bitmap, so that's bad.)
> 
> So if I understand correctly, the problem is that without a filter, in
> some setups we get a usable BlockBackend name like "drive1", but when a
> filter is added, we return the node-name instead which is
> auto-generated and will be different on the destination.
> 
> Looking at the ChildRole documentation:
> 
>     /* Returns a name that is supposedly more useful for human users than the
>      * node name for identifying the node in question (in particular, a BB
>      * name), or NULL if the parent can't provide a better name. */
>     const char *(*get_name)(BdrvChild *child);
> 
> I'd argue that a BlockBackend name is more useful for a human user even
> across filter, so I'd support a .get_name implementation for a filter
> child role (which Max wanted to introduce anyway for his filter series).
> 
> Of course, if you have a function that is made to return a convenient
> text for human users, and you use it for a stable ABI like the migration
> stream, this is an idea that would certainly have caused an entertaining
> Linus rant in the good old kernel times.
> 
>>> Is bitmap migration something that must be enabled explicitly or does
>>> it happen automatically? If it's explicit, then making an additional
>>> requirement (matching node-names) shouldn't be a problem.
>>
>> This means that bitmap migration becomes a blockdev-only feature.
> 
> I meant this more as the preferred way for the future rather than the
> only thing supported.
> 

My confusion then is how do you get the node-names to match *without*
using blockdev?

Or maybe I misunderstand.

We can always use explicit node-names whenever available, but if we
don't make blockdev a requirement, we need a fallback to determine an
addressable name.

> But Peter has actually mentioned that for libvirt it will be
> blockdev-only anyway. So do we even have a good reason to invest much
> for the non-blockdev case?
> 
> Maybe making it blockdev-only is actually pretty reasonable.
> 
>> Serious question: do we have plans to formally deprecate things like
>> -drive and mandate a blockdev workflow, or otherwise work to unify the
>> actual graph that gets created between the two methods?
> 
> It's high on my wishlist, though we can't before libvirt uses blockdev.
> 
> Maybe something to talk about at KVM Forum?
> 

Yes, please!

I've not well-understood the trajectory the block graph is taking. I
even wrote a graph visualization so I could understand the problem.

I think I have a better grip on what we have *currently* right now, so
I'd be keen to hear your roadmap for what we want to have and how it
ties into the CLI and QAPI.

>>>>> A simple way to do this might be a "child_unfiltered" BdrvChild role
>>>>> that simply bypasses the filter that was inserted and serves no real
>>>>> purpose other than to allow the child to have a parent link and find who
>>>>> it's """real""" parent is.
>>>>>
>>>>> Because of flushing, reopen, sync, drain &c &c &c I'm not sure how
>>>>> feasible this quick idea might be, though.
>>>>>
>>>>>
>>>>> - Corollary fix #1: call error_setg if the bitmap node name that's about
>>>>> to go over the wire is an autogenerated node: this is never correct!
>>>>>
>>>>> (Why not? because the target is incapable of matching the node-name
>>>>> because they are randomly generated AND you cannot specify node-names
>>>>> with # prefixes as they are especially reserved!
>>>>>
>>>>> (This raises a related problem: if you explicitly add bitmaps to nodes
>>>>> with autogenerated names, you will be unable to migrate them.))
>>>>>
>>>>
>>>> In other words, we need a well defined way to match nodes on source and 
>>>> destination,
>>>> keeping in mind filters, to migrate bitmaps correctly.
>>>>
>>
>> Yes, exactly.
>>
>>>> Hm, did you thought about bitmaps in filters? It's not a problem to create 
>>>> bitmap in
>>>> mirror-top filter during mirror job:)
>>>>
>>>> Or what about bitmaps in Quorum children? Or what about bitmap in qcow2 
>>>> file child bs?
>>>>
>>>> If node-names are different on source and destination, what is the same? 
>>>> Top blk name
>>>> and bdrv-children names (I recently saw Max's idea to check node "path" in 
>>>> iotest).
>>>
>>> blk_name has to be assumed to be "". The BdrvChild path changes when
>>> filters are inserted (and inserting filters on the destination that
>>> aren't present on the source, or vice versa, sounds like something that
>>> should just work).
>>>
>>> So both parts of this are not great ways for addressing nodes.
>>>
>>>> So, actually node is migration-addressable, if path 
>>>> <blk-name>/root[/child-name] to the
>>>> defines this node directly (we must not have children with same name for 
>>>> some node in
>>>> the path).
>>>>
>>>> And I think it's a correct way to define node in migration stream - by 
>>>> path.
>>>
>>> I'm afraid node-name is the only thing that could possibly work reliably
>>> for identifying nodes.
>>>
>>> Kevin
>>>
>>
>> It sounds like you are saying that bitmaps must become a blockdev-only
>> feature.
> 
> More like we can't rely on BlockBackend names in a blockdev setup.

Agree; so this discussion is now mostly about what the fallback looks like.

> BlockBackend names can work in a purely traditional setup if we make
> some effort to find the block backend even with filters in bettwen, but
> they aren't universal even then. And node-names work in the blockdev
> case, but they aren't universal either.
> 
> But as I said above, you made me wonder whether making it a
> blockdev-only feature wouldn't actually be a good idea.
> 

O:-) It's the most logically consistent, but we should figure out how to
make new CLI sugar that helps make our story consistent. Or get rid of
the CLI sugar, I guess.

(I kind of like sugar, perhaps infamously, but if our view is that QEMU
is a library and API and nothing more, the sugar can go and be replaced
by sugar in binding libraries that provide said sugar.)

>> I'm not sure if I have arrived at that conclusion yet, but it's at least
>> inarguable that with blockdev it's a lot simpler to guarantee correctness.
>>
>> However, we still have -cdrom and -hda and -drive and any number of
>> sugars that I think we aren't committed to getting rid of yet... (or ever?)
> 
> -hda and -cdrom currently have hard-coded BlockBackend names. I don't
> see what would stop us from changing this to hard-coded node names.
> 

Ideally, I'd see something like -hda and -cdrom become strict sugar for
-drive, which becomes strict sugar for -device/-blockdevice.

So that everything is using the same mechanisms and is well-defined.

I guess. Perhaps.

Let's really talk very seriously about this at KVM Forum -- it aligns
with some pet projects I want to investigate; namely simplifying the
QEMU CLI such that we can generate better documentation.

> -drive is something that we probably need to remove in its current form
> and introduce something new that is both user-friendly and powerful.
> Actually, we should probably add the replacement first. :-)
> 

Great! Looking forward to Lyon.

--js



reply via email to

[Prev in Thread] Current Thread [Next in Thread]