qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Assigning an eth port to a guest VM


From: Alex Williamson
Subject: Re: [Qemu-devel] Assigning an eth port to a guest VM
Date: Tue, 16 Jun 2015 08:43:21 -0600

On Tue, 2015-06-16 at 11:21 +0000, Yehuda Yitschak wrote:
> 
> 
> > -----Original Message-----
> > From: Alex Williamson [mailto:address@hidden
> > Sent: Monday, June 15, 2015 21:32
> > To: Yehuda Yitschak
> > Cc: Eric Auger; address@hidden; Yuval Caduri; Shadi Ammouri
> > Subject: Re: Assigning an eth port to a guest VM
> > 
> > On Mon, 2015-06-15 at 17:45 +0000, Yehuda Yitschak wrote:
> > > ________________________________________
> > > From: Alex Williamson <address@hidden>
> > > Sent: Monday, June 15, 2015 8:15 PM
> > > To: Yehuda Yitschak
> > > Cc: Eric Auger; address@hidden; Yuval Caduri; Shadi Ammouri
> > > Subject: Re: Assigning an eth port to a guest VM
> > >
> > > On Mon, 2015-06-15 at 16:52 +0000, Yehuda Yitschak wrote:
> > > >> ________________________________________
> > > >> From: Eric Auger <address@hidden>
> > > >> Sent: Monday, June 15, 2015 4:42 PM
> > > >> To: Yehuda Yitschak; address@hidden
> > > >> Cc: Yuval Caduri; Shadi Ammouri
> > > >> Subject: Re: Assigning an eth port to a guest VM
> > > >>
> > > >> Hi Yehuda,
> > > >> On 06/15/2015 01:01 PM, Yehuda Yitschak wrote:
> > > >> >> Cc: Eric Auger
> > > >> >>
> > > >> >>> -----Original Message-----
> > > >> >>> From: Yehuda Yitschak
> > > >> >>> Sent: Monday, June 15, 2015 9:35
> > > >> >>> To: address@hidden
> > > >> >>> Cc: Yuval Caduri; Shadi Ammouri
> > > >> >>> Subject: Assigning an eth port to a guest VM
> > > >> >>>
> > > >> >>> Hello
> > > >> >>>
> > > >> >>> I would to ask your advice on how to assign a semi-virtualized
> > > >> >>> Ethernet port to a guest VM
> > > >> >>>
> > > >> >>> The eth port's HW partially supports virtualization since the
> > > >> >>> data path MMIO registers (which controls rx/tx operation) are
> > duplicated per VM.
> > > >> >>> So for the run-time operation the guest can directly access the
> > > >> >>> MMIO registers, using VFIO-PLATFORM, and enjoy the
> > performance benefit.
> > > >> >>>
> > > >> >>> However for the initial setup and occasional configuration the
> > > >> >>> guest need to access control path registers which are shared for 
> > > >> >>> all
> > guests.
> > > >> >>> AFAIK this is usually done with HW emulation using trap &
> > > >> >>> emulate with QEMU.
> > > >> >>> So, to the best of my knowledge I need a mix of VFIO and HW
> > > >> >>> emulation to get the port to work with device assignment , right ?
> > > >> > Yes to me you're correct.
> > > >> >>>
> > > >> >>> Are there any standard methods for achieving this ?
> > > >> >>> Is there an example for such an existing HW in QEMU ?
> > > >> > Not yet unfortunately. To my knowledge the only platform devices
> > > >> > that were assigned with QEMU VFIO platform were standalone
> > > >> > duplicated devices, PL330, Calxeda Xgmac, SATA. So you are a
> > > >> > trailblazer on that track.
> > > >>
> > > >> Thanks. It's good to know the diagnosis :-)
> > > >>
> > > >> BTW - i thought SR-IOV uses a somewhat similar concept. AFAIK each
> > > >> virtual function (VF) gets a set of registers enabling it to
> > > >> perform data path but most of the configuration and management
> > operations are controlled by the host using the Physical Function PF driver.
> > > >> Are you familiar with that ?
> > > >> i know SR-IOV is not related to VFIO-PLATFORM but if the mixed of
> > > >> direct access and emulation exists there as well then maybe i can
> > > >> borrow some concepts
> > >
> > > > The difference for SR-IOV is that emulation of shared resources is
> > > >done  almost entirely in the hardware.  the PF configures the VFs and
> > > >may interact with them to some degree at runtime, but VFs are largely
> > > >separate devices from a software perspective.
> > >
> > > > The first question I would have for your device is whether there is
> > > > IOMMU isolation between the individual "functions".
> > >
> > > Yes. IOMMU isolation is possible.
> > >
> > > > If not, there's really nothing vfio can help with and they probably
> > > > ought to be used more as a macvtap interface.  If there is
> > > > isolation, then I'd assume we'd configure the device for direct
> > > > access to the duplicated registers and trap to QEMU for the
> > > > emulation portion.  For things were the emulation portion needs to
> > > > interact with the "PF", interfaces would need to be created in the 
> > > > kernel.
> > >
> > > Can you give a short example of such an interface ?
> > > Do you mean a special device or ioctl to handle the emulation request from
> > QEMU/VFIO ?
> > 
> > It's a trivial example, but with PCI we have a configuration space where the
> > first 4 bytes expose the vendor and device ID of the device.  With an SR-IOV
> > VF, these bytes are not populated and provided instead by the PF via the SR-
> > IOV capability definition on the PF.  The vfio-pci driver therefore exposes 
> > the
> > static PF defined vendor and device IDs though the VF config space.  It's
> > transparent to the user.
> > 
> > I would hope we wouldn't need any sort of special device or ioctl.  It 
> > sounds
> > like the "PF" registers are separate and distinct from the "VF"
> > registers, so the "PF" registers could be exposed through a separate VFIO
> > memory region that does not allow mmap, forcing them to be trapped into
> > QEMU and emulated in VFIO.
> > 
> > > > The vfio-platform pieces specific to your device might be the
> > > > logical place for that interaction with the PF to occur, ie.
> > > > emulation at the vfio-platform interface rather than in QEMU itself.
> > > > Thanks,
> > >
> > > That sounds simpler than adding QEMU to the mix.
> > > However for that to happen we need to trap into the vfio-platfrom driver,
> > right ?
> > > is that possible ?
> > 
> > Yes.  The vfio-platform driver specific to this device would expose a memory
> > region for those "VF" registers that does not allow mmap.  The only access
> > would be via read/write handlers.  You could then emulate/gate/police
> > access to those registers on the "PF" using kernel internal interfaces.  It
> > would be a kernel internal API for accessing the PF registers.  Thanks,
> 
> Eric, Alex,  Thank you very much for all your answers and details.
> From your answers it sounds like I need to extended vfio's resource query 
> mechanism to enable flagging
> certain resources as NO_MAP and then make VFIO in QEMU act accordingly.
> That looks like the easier part. The more complex part in my view is to 
> manage the trap to vfio-platform driver and emulate the access.
> 
> In any way, I will take some time to process all this into a solution and 
> fill in some gaps in my knowledge.

TBH, I don't see any need to extend VFIO based on your needs so far.
VFIO already has the ability to describe whether a region supports mmap.
If it doesn't support mmap, QEMU has no choice but to use an I/O memory
region and translate VM accesses into reads and writes.  vfio-pci
already makes extensive use of this capability today.  I/O port regions
don't support mmap on x86, so those regions never expose an mmap capable
flag.  We also require page alignment for mmap, so regions that are not
page aligned don't expose mmap for MMIO regions.  PCI config space also
does not support mmap because of the emulation and virtualization we do
in that space.  So not supporting mmap is really not uncommon.  You
might want to look at my slides from KVM Forum 2013 [1] which shows how
VFIO can really be thought of as a conduit for decomposing a device
through a file descriptor.  QEMU then recomposes it back into a device
through the QEMU driver model.  Since platform devices have no standard
like PCI to provide self discovery, there are device specific drivers on
both ends with vfio-platform.  Thanks,

Alex

[1] http://www.linux-kvm.org/images/e/ed/Kvm-forum-2013-VFIO-VGA.pdf




reply via email to

[Prev in Thread] Current Thread [Next in Thread]