|
From: | Anthony Liguori |
Subject: | Re: [Qemu-devel] [PATCH 08/13] iommu: Introduce IOMMU emulation infrastructure |
Date: | Tue, 15 May 2012 18:58:42 -0500 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120329 Thunderbird/11.0.1 |
On 05/15/2012 06:08 PM, Benjamin Herrenschmidt wrote:
On Tue, 2012-05-15 at 17:02 -0500, Anthony Liguori wrote:"6.2.1 Register Based Invalidation Interface The register based invalidations provides a synchronous hardware interface for invalidations. Software is expected to write to the IOTLB registers to submit invalidation command and may poll on these registers to check for invalidation completion. For optimal performance, hardware implementations are recommended to complete an invalidation request with minimal latency" This makes perfect sense. You write to an MMIO location to request invalidation and then *poll* on a separate register for completion. It's not a single MMIO operation that has an indefinitely return duration.Sure, it's an implementation detail, I never meant that it had to be a single blocking register access, all I said is that the HW must provide such a mechanism that is typically used synchronously by the operating system. Polling for completion is a perfectly legit way to do it, that's how we do it on the Apple G5 "DART" iommu as well. The fact that MMIO operations can block is orthogonal, it is possible however, especially with ancient PIO devices.
Even ancient PIO devices really don't block indefinitely.
In our case (TCEs) it's a hypervisor call, not an MMIO op, so to some extent it's even more likely to do "blocking" things.
Yes, so I think the right thing to do is not model hypercalls for sPAPR as synchronous calls but rather as asynchronous calls. Obviously, simply ones can use a synchronous implementation...
This is a matter of setting hlt=1 before dispatching the hypercall and passing a continuation to the call that when executed, prepare the CPUState for the hypercall return and then set hlt=0 to resume the CPU.
It would have been possible to implement a "busy" return status with the guest having to try again, unfortunately that's not how Linux has implemented it, so we are stuck with the current semantics. Now, if you think that dropping the lock isn't good, what do you reckon I should do ?
Add a reference count to dma map calls and a flush_pending flag. If flush_pending && ref > 0, return NULL for all map calls.
Decrement ref on unmap and if ref = 0 and flush_pending, clear flush_pending. You could add a flush_notifier too for this event.
dma_flush() sets flush_pending if ref > 0. Your TCE flush hypercall would register for flush notifications and squirrel away the hypercall completion continuation.
VT-d actually has a concept of a invalidation completion queue which delivers interrupt based notification of invalidation completion events. The above flush_notify would be the natural way to support this since in this case, there is no VCPU event that's directly involved in the completion event.
Regards, Anthony Liguori
Cheers, Ben.
[Prev in Thread] | Current Thread | [Next in Thread] |