qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 4/5] integratorcp: convert integratorcm to VMSta


From: Anthony Liguori
Subject: Re: [Qemu-devel] [PATCH 4/5] integratorcp: convert integratorcm to VMState
Date: Tue, 08 Nov 2011 09:32:47 -0600
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.21) Gecko/20110831 Lightning/1.0b2 Thunderbird/3.1.13

On 11/08/2011 09:15 AM, Avi Kivity wrote:
On 11/08/2011 05:04 PM, Anthony Liguori wrote:
What state is that?  Some devices have fixed size, offset, parent, and
enable/disable state (is there a word for that?), so there is no state
that needs to be transferred.  For other devices this is all dynamic.

Any mutable state should be save/restored.  Immutable state doesn't
need to be saved as it's created as part of the device model.

The memory API doesn't know which fields are mutable and which are not.

Right, but sending immutable fields is just wasteful, it's not functionally incorrect.


If the question is, how do we restore the immutable state, that should
be happening as part of device creation, no?

The way I see it, we create a link between some device state (a
register) and a memory API field (like the offset).  This way, when one
changes, so does the other.  In complicated devices we'll have to write
a callback.

In devices where we dynamically change the offset (it's mutable), we
should save the offset and restore it.  Since offset is sometimes
mutable and sometimes immutable, we should always save/restore it.  In
the cases where it's really immutable, since the value isn't changing,
there's no harm in doing save/restore.

There is, you're taking an implementation detail and making it into an
ABI.  Change the implementation and migration breaks.

Yes, that's a feature, not a bug. If we send too little state today in version X, then discover this while working on version X + 1, we have no recourse. We have to black list version X.

Discovering this is hard because we have to find a symptom of broken migration. This is often subtle like, "if you migrate while a floppy request is in flight, the request is lost resulting in a timeout in the guest kernel".

If we send too much state (internal implementation that is derived from something else) in version X, then discover this while working on version X + 1, we can filter the incoming state in X + 1 to just ignore the extra state and derive the correct internal state from the other stable registers.

Discovering cases like this is easy because migration fails directly--not indirectly through a functional regression. That means this is something we can very easily catch in regression testing.

I actually think this is the way to do it too. Save/restore everything by default and then as we develop and discover migration breaks, add filtering in the new versions to ignore and not send internal state. I don't think there's a tremendous amount of value is proactively filtering internal state. A lot of internal state never changes over a long period of time.

Yes, we could save just the device register, and use a callback to
regenerate the offset.  But that adds complexity and leads to more
save/restore bugs.

We shouldn't be reluctant to save/restore derived state.  Whether we
send it over the wire is a different story.  We should start by saving
as much state as we need to, and then sit down and start removing
state and adding callbacks as we need to.

"saving state without sending it over the wire" is another way of saying
"not saving state".

Or filtering it on the receiving end.  That's the fundamental difference.

Why?  The only thing that removing it does is create additional
complexity for save/restore.  You may argue that sending minimal state
improves migration compatibility but I think the current state of
save/restore is an existence proof that this line of reasoning is
incorrect.

It doesn't create additional complexity for save restore, and I don't
think that the current state of save/restore proves anything except that
it needs a lot more work.

It's very hard to do the style of save/restore that we do correctly.

Regards,

Anthony Liguori





reply via email to

[Prev in Thread] Current Thread [Next in Thread]