qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Guest IOMMU and Cisco usnic


From: Alex Williamson
Subject: Re: [Qemu-devel] Guest IOMMU and Cisco usnic
Date: Wed, 12 Feb 2014 12:34:25 -0700

On Wed, 2014-02-12 at 19:10 +0100, Benoît Canet wrote:
> Hi Alex,
> 
> After the IRC conversation we had a few days ago I understood that guest IOMMU
> was not implemented.
> 
> I have a real use case for it:
> 
> Cisco usnic allow to write MPI applications while driving the network card in
> userspace in order to optimize the latency. It's made for compute clusters.
> 
> The typical cloud provider don't provide bare metal access but only vms on top
> of Cisco's hardware hence VFIO is using the IOMMU to passthrough the NIC to 
> the
> guest and no IOMMU is present in the guest.
> 
> questions: Would writing a performing guest IOMMU implementation be possible ?
>            How complex this project looks for someone knowing IOMMUs issues ?
> 
> The ideal implementation would forward the IOMMU work to the host hardware for
> speed.
> 
> I can devote time writing the feature if it's doable.

Hi Benoît,

I imagine it's doable, but it's certainly not trivial, beyond that I
haven't put much thought into it.

VFIO running in a guest would need an IOMMU that implements both the
IOMMU API and IOMMU groups.  Whether that comes from an emulated
physical IOMMU (like VT-d) or from a new paravirt IOMMU would be for you
to decide.  VT-d would imply using a PCIe chipset like Q35 and trying to
bandage on VT-d or updating Q35 to something that natively supports
VT-d.  Getting a sufficiently similar PCIe hierarchy between host an
guest would also be required.

The current model of putting all guest devices in a single IOMMU domain
on the host is likely not what you would want and might imply a new VFIO
IOMMU backend that is better tuned for separate domains, sparse
mappings, and low-latency.  VFIO has a modular IOMMU design, so this
isn't architecturally a problem.  The VFIO user (QEMU) is able to select
which backend to use and the code is written with supporting multiple
backends in mind.

A complication you'll have is that the granularity of IOMMU operations
through VFIO is at the IOMMU group level, so the guest would not be able
to easily split devices grouped together on the host between separate
users in the guest.  That could be modeled as a conventional PCI bridge
masking the requester ID of devices in the guest such that host groups
are mirrored as guest groups.

There might also be more simple "punch-through" ways to do it, for
instance what if instead of trying to make it work like it does on the
host we invented a paravirt VFIO interface and the vfio-pv driver in the
guest populated /dev/vfio as slightly modified passthroughs to the host
fds.  The guest OS may not even really need to be aware of the device.

It's an interesting project and certainly a valid use case.  I'd also
like to see things like Intel's DPDK move to using VFIO, but the current
UIO DPDK is often used in guests.  Thanks,

Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]