qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH V2 07/11] virtio-pci: address space translation


From: Jason Wang
Subject: Re: [Qemu-devel] [PATCH V2 07/11] virtio-pci: address space translation service (ATS) support
Date: Fri, 11 Nov 2016 11:26:12 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0



On 2016年11月11日 01:32, Michael S. Tsirkin wrote:
On Fri, Nov 04, 2016 at 02:48:20PM +0800, Jason Wang wrote:

On 2016年11月04日 03:49, Michael S. Tsirkin wrote:
On Thu, Nov 03, 2016 at 05:27:19PM +0800, Jason Wang wrote:
This patches enable the Address Translation Service support for virtio
pci devices. This is needed for a guest visible Device IOTLB
implementation and will be required by vhost device IOTLB API
implementation for intel IOMMU.

Cc: Michael S. Tsirkin<address@hidden>
Signed-off-by: Jason Wang<address@hidden>
I'd like to understand why do you think this is strictly required.
Won't setting CM bit in the IOMMU do the trick.
ATS was chosen for performance. Since there're many problems for CM:

- CM was slow (10%-20% slower on real hardware for things like netperf)
because of each transition between non-present and present mapping needs an
explicit invalidation. It may slow down the whole VM.
- Without ATS/Device IOTLB, IOMMU becomes a bottleneck because of contending
of IOTLB entries. (What we can do in this case is in fact userspace IOTLB
snooping, this could be done even without CM).
It was natural to think of ATS when designing interface between IOMMU and
device/remote IOTLBs. Do you see any drawbacks on ATS here?

Thanks
In fact at this point I'm confused. Any mapping needs to be programmed
in the IOMMU. We need to implement this correctly.
Once we do why do we need ATS?
I think what you need is map/unmap notifiers that Aviv is working on.
No?

Let me clarify, device IOTLB API can work without ATS or CM. So there're three ways to do:

1) without ATS or CM support, the function could be implemented through:
1.1: asking for qemu help if there's an IOTLB miss in vhost
1.2: snooping the userspace IOTLB invalidation (present to non-present mapping) and update device IOTLB

2) with CM enabled, the only thing we can add is snooping the non-present to present mapping and update the device IOTLB. This is not a requirement since we still can get this through asking qemu's(1.2) help.

3) with ATS enabled, guest knows the existence of device IOTLB, and device IOTLB entires needs to be flushed explicitly by guest. In this case there's no need to snoop the ordinary IOTLB invalidation in 1.2. We just need to snoop the device IOTLB specific invalidation request from guest.

All the above 3 methods work very well, but let's have a look at performance impact:

- Method 1 (without CM or ATS), the performance is not the best since guest does not know about the existence of remote IOTLB, this means the flush of device IOTLB entry could not be done on demand. One example is some IOMMU driver (e.g intel) tends to optimize the IOTLB invalidations by issuing a global invalidation periodically. We need to flush the device IOTLB too in this case. Thus we can notice some jitter (because of IOTLB miss).

- Method 2 (with CM but without ATS) seems to be the worst case. It has not only all problems above a but also a new one: each transition needs to notify the device explicitly. Even if dpdk use static mappings, all other devices in the VM use dynamic ones which slows down the whole the system. According to the test, CM is about 10%-20% slower in real hardware.

- Method 3 (ATS) can give the best performance, all the problems have gone since guest can flush the device IOTLB entry on demand. It was defined by spec and was designed to solve the issues just like what we meet here, and was supported by modern IOMMUs.

And what's even better, implementing ATS turns out less than 100 lines of codes. And it was much more easier to be enabled on other IOMMU (AMD IOMMU only needs 20 lines of codes). All other ways (I started and have codes for method 1 for intel IOMMU) need lots of work specific to each kind of IOMMU.

Consider so much advantages by just adding so small lines of codes. I don't see why we don't need ATS (for the IOOMUs that supports it).

Thanks



Also, could you remind me pls - can guests just disable ATS?

What happens then?






reply via email to

[Prev in Thread] Current Thread [Next in Thread]