qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC v1 7/7] vhost: abort if an emulated iommu is used


From: Avi Kivity
Subject: Re: [Qemu-devel] [RFC v1 7/7] vhost: abort if an emulated iommu is used
Date: Mon, 15 Oct 2012 12:24:45 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120911 Thunderbird/15.0.1

On 10/11/2012 09:38 PM, Alex Williamson wrote:
> On Thu, 2012-10-11 at 17:48 +0200, Avi Kivity wrote:
>> On 10/11/2012 05:34 PM, Michael S. Tsirkin wrote:
>> > On Thu, Oct 11, 2012 at 04:35:23PM +0200, Avi Kivity wrote:
>> >> On 10/11/2012 04:35 PM, Michael S. Tsirkin wrote:
>> >> 
>> >> >> No, qemu should configure virtio devices to bypass the iommu, even if 
>> >> >> it
>> >> >> is on.
>> >> > 
>> >> > Okay so there will be some API that virtio devices should call
>> >> > to achieve this?
>> >> 
>> >> The iommu should probably call pci_device_bypasses_iommu() to check for
>> >> such devices.
>> > 
>> > So maybe this patch should depend on the introduction of such
>> > an API.
>> 
>> I've dropped it for now.
>> 
>> In fact, virtio/vhost are safe since they use cpu_physical_memory_rw()
>> and the memory listener watches address_space_memory, no iommu there.
>> vfio needs to change to listen to pci_dev->bus_master_as, and need
>> special handling for iommu regions (abort for now, type 2 iommu later).
> 
> I don't see how we can ever support an assigned device with the
> translate function.  

We cannot.

> Don't we want a flat address space at run time
> anyway?  

Not if we want vfio-in-the-guest (for nested virt or OS bypass).

> IOMMU drivers go to pains to make IOTLB updates efficient and
> drivers optimize for long running translations, but here we impose a
> penalty on every access.  I think we'd be more efficient and better able
> to support assigned devices if the per device/bus address space was
> updated and flattened when it changes.  

A flattened address space cannot be efficiently implemented with a
->translate() callback.  Describing the transformed address space
requires walking all the iommu page tables; these can change very
frequently for some use cases, and the io page tables can be built after
the iommu is configured but before dma is initiated, so you have no hook
from which to call ->translate(); and the representation of the address
space can be huge.

> Being able to implement an XOR
> IOMMU is impressive, but is it practical?  

The XOR IOMMU is just a way for me to test and demonstrate the API.

> We could be doing much more
> practical things like nested device assignment with a flatten
> translation ;)  Thanks,

No, a flattened translation is impractical, at least when driven from qemu.

My plans wrt vfio/kvm here are to have memory_region_init_iommu()
provide, in addition to ->translate(), a declarative description of the
translation function.  In practical terms, this means that the API will
receive the name of the spec that the iommu implements:

  MemoryRegionIOMMUOps amd_iommu_v2_ops = {
      .translate = amd_iommu_v2_ops,
      .translation_type = IOMMU_AMD_V2,
  };

qemu-side vfio would then match ->translation_type with what the kernel
provides, and configure the kernel for this type of translation.  As
some v2 hardware supports two levels of translations, all vfio has to do
is to set up the lower translation level to match the guest->host
translation (which it does already), and to set up the upper translation
level to follow the guest configuration.  From then on the hardware does
the rest.

If the hardware supports only one translation level, we may still be
able to implement nested iommu using the same techniques we use for the
processor page tables - shadowing.  kvm would write-protect the iommu
page tables and pass any updates to vfio, which would update the shadow
io page tables that implement the ngpa->gpa->hpa translation.  However
given the complexity and performance problems on one side, and the size
of the niche that nested device assignment serves, we'll probably limit
ourselves to hardware that supports two levels of translations.  If
nested virtualization really takes off we can use shadowing to provide
the guest with emulated hardware that supports two translation level
(the solution above uses host hardware with two levels to expose guest
hardware with one level).

-- 
error compiling committee.c: too many arguments to function



reply via email to

[Prev in Thread] Current Thread [Next in Thread]