[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v2 01/10] intel-iommu: send PSI always even if a
From: |
Auger Eric |
Subject: |
Re: [Qemu-devel] [PATCH v2 01/10] intel-iommu: send PSI always even if across PDEs |
Date: |
Fri, 18 May 2018 09:39:42 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 |
Hi,
On 05/18/2018 05:41 AM, Peter Xu wrote:
> On Thu, May 17, 2018 at 04:42:54PM +0200, Auger Eric wrote:
>> Hi Peter,
>>
>> On 05/04/2018 05:08 AM, Peter Xu wrote:
>>> During IOVA page table walking, there is a special case when the PSI
>>> covers one whole PDE (Page Directory Entry, which contains 512 Page
>>> Table Entries) or more. In the past, we skip that entry and we don't
>>> notify the IOMMU notifiers. This is not correct. We should send UNMAP
>>> notification to registered UNMAP notifiers in this case.
>>>
>>> For UNMAP only notifiers, this might cause IOTLBs cached in the devices
>>> even if they were already invalid. For MAP/UNMAP notifiers like
>>> vfio-pci, this will cause stale page mappings.
>>>
>>> This special case doesn't trigger often, but it is very easy to be
>>> triggered by nested device assignments, since in that case we'll
>>> possibly map the whole L2 guest RAM region into the device's IOVA
>>> address space (several GBs at least), which is far bigger than normal
>>> kernel driver usages of the device (tens of MBs normally).
>>>
>>> Without this patch applied to L1 QEMU, nested device assignment to L2
>>> guests will dump some errors like:
>>>
>>> qemu-system-x86_64: VFIO_MAP_DMA: -17
>>> qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
>>> 0x7f89a920d000) = -17 (File exists)
>>>
>>> Acked-by: Jason Wang <address@hidden>
>>> [peterx: rewrite the commit message]
>>> Signed-off-by: Peter Xu <address@hidden>
>>> ---
>>> hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++------------
>>> 1 file changed, 30 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>>> index fb31de9416..b359efd6f9 100644
>>> --- a/hw/i386/intel_iommu.c
>>> +++ b/hw/i386/intel_iommu.c
>>> @@ -722,6 +722,15 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce,
>>> uint64_t iova, bool is_write,
>>>
>>> typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
>>>
>>> +static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
>>> + vtd_page_walk_hook hook_fn, void *private)
>> I find the function name a bit weird as it does not does a ptw but
>> rather call a callback on an entry. vtd_callback_wrapper?
>
> It's a hook for the page walk process, and IMHO vtd_callback_wrapper
> does not really provide any hint for the page walking. So even if you
> prefer the "callback_wrapper" naming I would still more prefer:
>
> vtd_page_walk_callback[_wrapper]
>
> though if so I'd say I don't see much benefit comparing to use the old
> vtd_page_walk_hook, which seems fine to me too...
I preferred vtd_page_walk_hook too.
Thanks
Eric
>
>>> +{
>>> + assert(hook_fn);
>>> + trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,
>>> + entry->addr_mask, entry->perm);
>>> + return hook_fn(entry, private);
>>> +}
>>> +
>>> /**
>>> * vtd_page_walk_level - walk over specific level for IOVA range
>>> *
>>> @@ -781,28 +790,37 @@ static int vtd_page_walk_level(dma_addr_t addr,
>>> uint64_t start,
>>> */
>>> entry_valid = read_cur | write_cur;
>>>
>>> + entry.target_as = &address_space_memory;
>>> + entry.iova = iova & subpage_mask;
>>> + entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
>>> + entry.addr_mask = ~subpage_mask;
>>> +
>>> if (vtd_is_last_slpte(slpte, level)) {
>>> - entry.target_as = &address_space_memory;
>>> - entry.iova = iova & subpage_mask;
>>> /* NOTE: this is only meaningful if entry_valid == true */
>>> entry.translated_addr = vtd_get_slpte_addr(slpte, aw);
>>> - entry.addr_mask = ~subpage_mask;
>>> - entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
>>> if (!entry_valid && !notify_unmap) {
>>> trace_vtd_page_walk_skip_perm(iova, iova_next);
>>> goto next;
>>> }
>>> - trace_vtd_page_walk_one(level, entry.iova,
>>> entry.translated_addr,
>>> - entry.addr_mask, entry.perm);
>>> - if (hook_fn) {
>>> - ret = hook_fn(&entry, private);
>>> - if (ret < 0) {
>>> - return ret;
>>> - }
>>> + ret = vtd_page_walk_one(&entry, level, hook_fn, private);
>>> + if (ret < 0) {
>>> + return ret;
>>> }
>>> } else {
>>> if (!entry_valid) {
>>> - trace_vtd_page_walk_skip_perm(iova, iova_next);
>>> + if (notify_unmap) {
>>> + /*
>>> + * The whole entry is invalid; unmap it all.
>>> + * Translated address is meaningless, zero it.
>>> + */
>>> + entry.translated_addr = 0x0;
>> do you really need to zero the translated_addr and the related comment.
>> As soon as perm is NONE this should not be used?
>
> Yes here we can avoid setting it. However that'll make sure we don't
> leak strange numbers to the below notifiers, so I would still slightly
> prefer to zero it here.
>
>>> + ret = vtd_page_walk_one(&entry, level, hook_fn,
>>> private);
>>> + if (ret < 0) {
>>> + return ret;
>>> + }
>>> + } else {
>>> + trace_vtd_page_walk_skip_perm(iova, iova_next);
>>> + }
>>> goto next;
>>> }
>>> ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova,
>>>
>>
>> Besides
>> Reviewed-by: Eric Auger <address@hidden>
>
> Thanks for reviewing the series.
>
> Note that this v2 is obsolete, please feel free to read version 3 of
> this series, which contains quite a lot of functional changes.
>
> Regards,
>
[Qemu-devel] [PATCH v2 04/10] intel-iommu: only do page walk for MAP notifiers, Peter Xu, 2018/05/03