qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] VFIO and scheduled SR-IOV cards


From: Don Dutile
Subject: Re: [Qemu-devel] VFIO and scheduled SR-IOV cards
Date: Mon, 03 Jun 2013 14:34:29 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.11) Gecko/20121116 Thunderbird/10.0.11

On 06/03/2013 02:02 PM, Alex Williamson wrote:
On Mon, 2013-06-03 at 18:33 +0200, Benoît Canet wrote:
Hello,

I plan to write a PF driver for an SR-IOV card and make the VFs work with QEMU's
VFIO passthrough so I am asking the following design question before trying to
write and push code.

After SR-IOV being enabled on this hardware only one VF function can be active
at a given time.

Is this actually an SR-IOV device or are you trying to write a driver
that emulates SR-IOV for a PF?

The PF host kernel driver is acting as a scheduler.
It switch every few milliseconds which VF is the current active function while
disabling the others VFs.

that's time-sharing of hw, which sw doesn't see ... so, ok.

One consequence of how the hardware works is that the MMR regions of the
switched off VFs must be unmapped and their io access should block until the VF
is switched on again.

This violates the spec., and does impact sw -- how can one assign such a VF to 
a guest
-- it does not work indep. of other VFs.

MMR = Memory Mapped Register?

This seems contradictory to the SR-IOV spec, which states:

         Each VF contains a non-shared set of physical resources required
         to deliver Function-specific
         services, e.g., resources such as work queues, data buffers,
         etc. These resources can be directly
         accessed by an SI without requiring VI or SR-PCIM intervention.

Furthermore, each VF should have a separate requester ID.  What's being
suggested here seems like maybe that's not the case.  If true, it would
I didn't read it that way above.  I read it as the PCIe end is timeshared
btwn VFs (& PFs?). .... with some VFs disappearing (from a driver perspective)
as if the device was hot unplug w/o notification.  That will probably cause
read-timeouts & SME's, bringing down most enterprise-level systems.

make iommu groups challenging.  Is there any VF save/restore around the
scheduling?

Each IOMMU map/unmap should be done in less than 100ns.

I think that may be a lot to ask if we need to unmap the regions in the
guest and in the iommu.  If the "VFs" used different requester IDs,
iommu unmapping whouldn't be necessary.  I experimented with switching
between trapped (read/write) access to memory regions and mmap'd (direct
mapping) for handling legacy interrupts.  There was a noticeable
performance penalty switching per interrupt.

As the kernel iommu module is being called by the VFIO driver the PF driver
cannot interface with it.

Currently the only interface of the VFIO code is for the userland QEMU process
and I fear that notifying QEMU that it should do the unmap/block would take more
than 100ns.

Also blocking the IO access in QEMU under the BQL would freeze QEMU.

Do you have and idea on how to write this required map and block/unmap feature ?

It seems like there are several options, but I'm doubtful that any of
them will meet 100ns.  If this is completely fake SR-IOV and there's not
a different requester ID per VF, I'd start with seeing if you can even
do the iommu_unmap/iommu_map of the MMIO BARs in under 100ns.  If that's
close to your limit, then your only real option for QEMU is to freeze
it, which still involves getting multiple (maybe many) vCPUs out of VM
mode.  That's not free either.  If by some miracle you have time to
spare, you could remap the regions to trapped mode and let the vCPUs run
while vfio blocks on read/write.

Maybe there's even a question whether mmap'd mode is worthwhile for this
device.  Trapping every read/write is orders of magnitude slower, but
allows you to handle the "wait for VF" on the kernel side.

If you can provide more info on the device design/contraints, maybe we
can come up with better options.  Thanks,

Alex

_______________________________________________
iommu mailing list
address@hidden
https://lists.linuxfoundation.org/mailman/listinfo/iommu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]