qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/5] vfio/migration: define kernel interfaces


From: Zhao Yan
Subject: Re: [Qemu-devel] [PATCH 1/5] vfio/migration: define kernel interfaces
Date: Wed, 20 Feb 2019 20:47:45 -0500
User-agent: Mutt/1.9.4 (2018-02-28)

On Wed, Feb 20, 2019 at 06:08:13PM +0100, Cornelia Huck wrote:
> On Wed, 20 Feb 2019 02:36:36 -0500
> Zhao Yan <address@hidden> wrote:
> 
> > On Tue, Feb 19, 2019 at 02:09:18PM +0100, Cornelia Huck wrote:
> > > On Tue, 19 Feb 2019 16:52:14 +0800
> > > Yan Zhao <address@hidden> wrote:
> (...)
> > > > + *          Size of device config data is smaller than or equal to 
> > > > that of
> > > > + *          device config region.  
> > > 
> > > Not sure if I understand that sentence correctly... but what if a
> > > device has more config state than fits into this region? Is that
> > > supposed to be covered by the device memory region? Or is this assumed
> > > to be something so exotic that we don't need to plan for it?
> > >   
> > Device config data and device config region are all provided by vendor
> > driver, so vendor driver is always able to create a large enough device
> > config region to hold device config data.
> > So, if a device has data that are better to be saved after device stop and
> > saved/loaded in strict order, the data needs to be in device config region.
> > This kind of data is supposed to be small.
> > If the device data can be saved/loaded several times, it can also be put
> > into device memory region.
> 
> So, it is the vendor driver's decision which device information should
> go via which region? With the device config data supposed to be
> saved/loaded in one go?
Right, exactly.


> (...)
> > > > +/* version number of the device state interface */
> > > > +#define VFIO_DEVICE_STATE_INTERFACE_VERSION 1  
> > > 
> > > Hm. Is this supposed to be backwards-compatible, should we need to bump
> > > this?
> > >  
> > currently no backwords-compatible. we can discuss on that.
> 
> It might be useful if we discover that we need some extensions. But I'm
> not sure how much work it would be.
> 
> (...)
> > > > +/*
> > > > + * DEVICE STATES
> > > > + *
> > > > + * Four states are defined for a VFIO device:
> > > > + * RUNNING, RUNNING & LOGGING, STOP & LOGGING, STOP.
> > > > + * They can be set by writing to device_state field of
> > > > + * vfio_device_state_ctl region.  
> > > 
> > > Who controls this? Userspace?  
> > 
> > Yes. Userspace notifies vendor driver to do the state switching.
> 
> Might be good to mention this (just to make it obvious).
>
Got it. thanks

> > > > + * LOGGING state is a special state that it CANNOT exist
> > > > + * independently.  
> > > 
> > > So it's not a state, but rather a modifier?
> > >   
> > yes. or thinking LOGGING/not LOGGING as bit 1 of a device state,
> > whereas RUNNING/STOPPED is bit 0 of a device state.
> > They have to be got as a whole.
> 
> So it is (on a bit level):
> RUNNING -> 00
> STOPPED -> 01
> LOGGING/RUNNING -> 10
> LOGGING/STOPPED -> 11
> 

Yes.

> > > > + * It must be set alongside with state RUNNING or STOP, i.e,
> > > > + * RUNNING & LOGGING, STOP & LOGGING.
> > > > + * It is used for dirty data logging both for device memory
> > > > + * and system memory.
> > > > + *
> > > > + * LOGGING only impacts device/system memory. In LOGGING state, get 
> > > > buffer
> > > > + * of device memory returns dirty pages since last call; outside 
> > > > LOGGING
> > > > + * state, get buffer of device memory returns whole snapshot of device
> > > > + * memory. system memory's dirty page is only available in LOGGING 
> > > > state.
> > > > + *
> > > > + * Device config should be always accessible and return whole config 
> > > > snapshot
> > > > + * regardless of LOGGING state.
> > > > + * */
> > > > +#define VFIO_DEVICE_STATE_RUNNING 0
> > > > +#define VFIO_DEVICE_STATE_STOP 1
> > > > +#define VFIO_DEVICE_STATE_LOGGING 2
> 
> This makes it look a bit like LOGGING were an individual state, while 2
> is in reality LOGGING/RUNNING... not sure how to make that more
> obvious. Maybe (as we are dealing with a u32):
> 
> #define VFIO_DEVICE_STATE_RUNNING 0x00000000
> #define VFIO_DEVICE_STATE_STOPPED 0x00000001
> #define VFIO_DEVICE_STATE_LOGGING_RUNNING 0x00000002
> #define VFIO_DEVICE_STATE_LOGGING_STOPPED 0x00000003
> #define VFIO_DEVICE_STATE_LOGGING_MASK 0x00000002
>
Yes, yours are better, thanks:)

> > > > +
> > > > +/* action to get data from device memory or device config
> > > > + * the action is write to device state's control region, and data is 
> > > > read
> > > > + * from device memory region or device config region.
> > > > + * Each time before read device memory region or device config region,
> > > > + * action VFIO_DEVICE_DATA_ACTION_GET_BUFFER is required to write to 
> > > > action
> > > > + * field in control region. That is because device memory and devie 
> > > > config
> > > > + * region is mmaped into user space. vendor driver has to be notified 
> > > > of
> > > > + * the the GET_BUFFER action in advance.
> > > > + */
> > > > +#define VFIO_DEVICE_DATA_ACTION_GET_BUFFER 1
> > > > +
> > > > +/* action to set data to device memory or device config
> > > > + * the action is write to device state's control region, and data is
> > > > + * written to device memory region or device config region.
> > > > + * Each time after write to device memory region or device config 
> > > > region,
> > > > + * action VFIO_DEVICE_DATA_ACTION_GET_BUFFER is required to write to 
> > > > action
> > > > + * field in control region. That is because device memory and devie 
> > > > config
> > > > + * region is mmaped into user space. vendor driver has to be notified 
> > > > of
> > > > + * the the SET_BUFFER action after data written.
> > > > + */
> > > > +#define VFIO_DEVICE_DATA_ACTION_SET_BUFFER 2  
> > > 
> > > Let me describe this in my own words to make sure that I understand
> > > this correctly.
> > > 
> > > - The actions are set by userspace to notify the kernel that it is
> > >   going to get data or that it has just written data.
> > > - This is needed as a notification that the mmapped data should not be
> > >   changed resp. just has changed.  
> > we need this notification is because when userspace read the mmapped data,
> > it's from the ptr returned from mmap(). So, when userspace reads that ptr,
> > there will be no page fault or read/write system calls, so vendor driver
> > does not know whether read/write opertation happens or not. 
> > Therefore, before userspace reads the ptr from mmap, it first writes action
> > field in control region (through write system call), and vendor driver
> > will not return the write system call until data prepared.
> > 
> > When userspace writes to that ptr from mmap, it writes data to the data
> > region first, then writes the action field in control region (through write
> > system call) to notify vendor driver. vendor driver will return the system
> > call after it copies the buffer completely.
> > > 
> > > So, how does the kernel know whether the read action has finished resp.
> > > whether the write action has started? Even if userspace reads/writes it
> > > as a whole.
> > >   
> > kernel does not touch the data region except when in response to the
> > "action" write system call.
> 
> Thanks for the explanation, that makes sense.
> (...)
welcome:)
> _______________________________________________
> intel-gvt-dev mailing list
> address@hidden
> https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev



reply via email to

[Prev in Thread] Current Thread [Next in Thread]