Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration

From:	Zhao Yan
Subject:	Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration
Date:	Thu, 14 Mar 2019 19:05:06 -0400
User-agent:	Mutt/1.9.4 (2018-02-28)

On Fri, Mar 15, 2019 at 06:44:58AM +0800, Alex Williamson wrote:
> On Wed, 13 Mar 2019 21:12:22 -0400
> Zhao Yan <address@hidden> wrote:
> 
> > On Thu, Mar 14, 2019 at 03:14:54AM +0800, Alex Williamson wrote:
> > > On Tue, 12 Mar 2019 21:13:01 -0400
> > > Zhao Yan <address@hidden> wrote:
> > >   
> > > > hi Alex
> > > > Any comments to the sequence below?
> > > > 
> > > > Actaully we have some concerns and suggestions to userspace-opaque 
> > > > migration
> > > > data.
> > > > 
> > > > 1. if data is opaque to userspace, kernel interface must be tightly 
> > > > bound to
> > > > migration. 
> > > >    e.g. vendor driver has to know state (running + not logging) should 
> > > > not
> > > >    return any data, and state (running + logging) should return whole
> > > >    snapshot first and dirty later. it also has to know qemu migration 
> > > > will
> > > >    not call GET_BUFFER in state (running + not logging), otherwise, it 
> > > > has
> > > >    to adjust its behavior.  
> > > 
> > > This all just sounds like defining the protocol we expect with the
> > > interface.  For instance if we define a session as beginning when
> > > logging is enabled and ending when the device is stopped and the
> > > interface reports no more data is available, then we can state that any
> > > partial accumulation of data is incomplete relative to migration.  If
> > > userspace wants to initiate a new migration stream, they can simply
> > > toggle logging.  How the vendor driver provides the data during the
> > > session is not defined, but beginning the session with a snapshot
> > > followed by repeated iterations of dirtied data is certainly a valid
> > > approach.
> > >   
> > > > 2. vendor driver cannot ensure userspace get all the data it intends to
> > > > save in pre-copy phase.
> > > >   e.g. in stop-and-copy phase, vendor driver has to first check and send
> > > >   data in previous phase.  
> > > 
> > > First, I don't think the device has control of when QEMU switches from
> > > pre-copy to stop-and-copy, the protocol needs to support that
> > > transition at any point.  However, it seems a simply data available
> > > counter provides an indication of when it might be optimal to make such
> > > a transition.  If a vendor driver follows a scheme as above, the
> > > available data counter would indicate a large value, the entire initial
> > > snapshot of the device.  As the migration continues and pages are
> > > dirtied, the device would reach a steady state amount of data
> > > available, depending on the guest activity.  This could indicate to the
> > > user to stop the device.  The migration stream would not be considered
> > > completed until the available data counter reaches zero while the
> > > device is in the stopped|logging state.
> > >   
> > > > 3. if all the sequence is tightly bound to live migration, can we 
> > > > remove the
> > > > logging state? what about adding two states migrate-in and migrate-out?
> > > > so there are four states: running, stopped, migrate-in, migrate-out.
> > > >    migrate-out is for source side when migration starts. together with
> > > >    state running and stopped, it can substitute state logging.
> > > >    migrate-in is for target side.  
> > > 
> > > In fact, Kirti's implementation specifies a data direction, but I think
> > > we still need logging to indicate sessions.  I'd also assume that
> > > logging implies some overhead for the vendor driver.
> > >  
> > ok. If you prefer logging, I'm ok with it. just found migrate-in and
> > migrate-out are more universal againt hardware requirement changes.
> > 
> > > > On Tue, Mar 12, 2019 at 10:57:47AM +0800, Zhao Yan wrote:  
> > > > > hi Alex
> > > > > thanks for your reply.
> > > > > 
> > > > > So, if we choose migration data to be userspace opaque, do you think 
> > > > > below
> > > > > sequence is the right behavior for vendor driver to follow:
> > > > > 
> > > > > 1. initially LOGGING state is not set. If userspace calls GET_BUFFER 
> > > > > to
> > > > > vendor driver,  vendor driver should reject and return 0.  
> > > 
> > > What would this state mean otherwise?  If we're not logging then it
> > > should not be expected that we can construct dirtied data from a
> > > previous read of the state before logging was enabled (it would be
> > > outside of the "session").  So at best this is an incomplete segment of
> > > the initial snapshot of the device, but that presumes how the vendor
> > > driver constructs the data.  I wouldn't necessarily mandate the vendor
> > > driver reject it, but I think we should consider it undefined and
> > > vendor specific relative to the migration interface.
> > >   
> > > > > 2. then LOGGING state is set, if userspace calls GET_BUFFER to vendor
> > > > > driver,
> > > > >    a. vendor driver shoud first query a whole snapshot of device 
> > > > > memory
> > > > >    (let's use this term to represent device's standalone memory for 
> > > > > now),
> > > > >    b. vendor driver returns a chunk of data just queried to userspace,
> > > > >    while recording current pos in data.
> > > > >    c. vendor driver finds all data just queried is finished 
> > > > > transmitting to
> > > > >    userspace, and queries only dirty data in device memory now.
> > > > >    d. vendor driver returns a chunk of data just quered (this time is 
> > > > > dirty
> > > > >    data )to userspace while recording current pos in data
> > > > >    e. if all data is transmited to usespace and still GET_BUFFERs 
> > > > > come from
> > > > >    userspace, vendor driver starts another round of dirty data query. 
> > > > >  
> > > 
> > > This is a valid vendor driver approach, but it's outside the scope of
> > > the interface definition.  A vendor driver could also decide to not
> > > provide any data until both stopped and logging are set and then
> > > provide a fixed, final snapshot.  The interface supports either
> > > approach by defining the protocol to interact with it.
> > >   
> > > > > 3. if LOGGING state is unset then, and userpace calls GET_BUFFER to 
> > > > > vendor
> > > > > driver,
> > > > >    a. if vendor driver finds there's previously untransmitted data, 
> > > > > returns
> > > > >    them until all transmitted.
> > > > >    b. vendor driver then queries dirty data again and transmits them.
> > > > >    c. at last, vendor driver queris device config data (which has to 
> > > > > be
> > > > >    queried at last and sent once) and transmits them.  
> > > 
> > > This seems broken, the vendor driver is presuming the user intentions.
> > > If logging is unset, we return to bullet 1, reading data is undefined
> > > and vendor specific.  It's outside of the session.
> > >   
> > > > > for the 1 bullet, if LOGGING state is firstly set and migration aborts
> > > > > then,  vendor driver has to be able to detect that condition. so 
> > > > > seemingly,
> > > > > vendor driver has to know more qemu's migration state, like migration
> > > > > called and failed. Do you think that's acceptable?  
> > > 
> > > If migration aborts, logging is cleared and the device continues
> > > operation.  If a new migration is started, the session is initiated by
> > > enabling logging.  Sound reasonable?  Thanks,
> > >  
> > 
> > For the flow, I still have a question.
> > There are 2 approaches below, which one do you prefer?
> > 
> > Approach A, in precopy stage, the sequence is
> > 
> > (1)
> > .save_live_pending --> return whole snapshot size
> > .save_live_iterate --> save whole snapshot
> > 
> > (2)
> > .save_live_pending --> get dirty data, return dirty data size
> > .save_live_iterate --> save all dirty data
> > 
> > (3)
> > .save_live_pending --> get dirty data again, return dirty data size
> > .save_live_iterate --> save all dirty data
> > 
> > 
> > Approach B, in precopy stage, the sequence is
> > (1)
> > .save_live_pending --> return whole snapshot size
> > .save_live_iterate --> save part of snapshot
> > 
> > (2)
> > .save_live_pending --> return rest part of whole snapshot size +
> >                               current dirty data size
> > .save_live_iterate --> save part of snapshot 
> > 
> > (3) repeat (2) until whole snapshot saved.
> > 
> > (4) 
> > .save_live_pending --> get diryt data and return current dirty data size
> > .save_live_iterate --> save part of dirty data
> > 
> > (5)
> > .save_live_pending --> return reset part of dirty data size +
> >                          delta size of dirty data
> > .save_live_iterate --> save part of dirty data
> > 
> > (6)
> > repeat (5) until precopy stops
> 
> I don't really understand the question here.  If the vendor driver's
> approach is to send a full snapshot followed by iterations of dirty
> data, then when the user enables logging and reads the counter for
> available data it should report the (size of the snapshot).  The next
> time the user reads the counter, it should report the size of the
> (size of the snapshot) - (what the user has already read) + (size of
> the dirty data since the snapshot).  As the user continues to read past
> the snapshot data, the available data counter transitions to reporting
> only the size of the remaining dirty data, which is monotonically
> increasing.  I guess this would be more similar to your approach B,
> which seems to suggest that the interface needs to continue providing
> data regardless of whether the user fully exhausted the available data
> from the previous cycle.  Thanks,
>

Right. But regarding to the VFIO migration code in QEMU, rather than save
one chunk each time, do you think it is better to exhaust all reported data
from .save_live_pending in each .save_live_iterate callback? (eventhough 
vendor driver will handle the case that if userspace cannot exhaust
all data, VFIO QEMU can still try to save as many available data as it can
each time).

> Alex

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration, (continued)
- Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration, Zhao Yan, 2019/03/27
  - Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration, Dr. David Alan Gilbert, 2019/03/27
    - Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration, Alex Williamson, 2019/03/27
    - Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration, Zhao Yan, 2019/03/28
    - Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration, Erik Skultety, 2019/03/28
    - Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration, Alex Williamson, 2019/03/28

Prev by Date: Re: [Qemu-devel] [Xen-devel] Commit 331b51 breaks live migration on FreeBSD/Xen dom0
Next by Date: [Qemu-devel] [PATCH v3] hw/acpi: extract acpi_add_rom_blob()
Previous by thread: Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration
Next by thread: Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration
Index(es):
- Date
- Thread