qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 01/10] intel-iommu: send PSI always even if a


From: Peter Xu
Subject: Re: [Qemu-devel] [PATCH v2 01/10] intel-iommu: send PSI always even if across PDEs
Date: Fri, 18 May 2018 11:41:53 +0800
User-agent: Mutt/1.9.5 (2018-04-13)

On Thu, May 17, 2018 at 04:42:54PM +0200, Auger Eric wrote:
> Hi Peter,
> 
> On 05/04/2018 05:08 AM, Peter Xu wrote:
> > During IOVA page table walking, there is a special case when the PSI
> > covers one whole PDE (Page Directory Entry, which contains 512 Page
> > Table Entries) or more.  In the past, we skip that entry and we don't
> > notify the IOMMU notifiers.  This is not correct.  We should send UNMAP
> > notification to registered UNMAP notifiers in this case.
> > 
> > For UNMAP only notifiers, this might cause IOTLBs cached in the devices
> > even if they were already invalid.  For MAP/UNMAP notifiers like
> > vfio-pci, this will cause stale page mappings.
> > 
> > This special case doesn't trigger often, but it is very easy to be
> > triggered by nested device assignments, since in that case we'll
> > possibly map the whole L2 guest RAM region into the device's IOVA
> > address space (several GBs at least), which is far bigger than normal
> > kernel driver usages of the device (tens of MBs normally).
> > 
> > Without this patch applied to L1 QEMU, nested device assignment to L2
> > guests will dump some errors like:
> > 
> > qemu-system-x86_64: VFIO_MAP_DMA: -17
> > qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
> >                     0x7f89a920d000) = -17 (File exists)
> > 
> > Acked-by: Jason Wang <address@hidden>
> > [peterx: rewrite the commit message]
> > Signed-off-by: Peter Xu <address@hidden>
> > ---
> >  hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++------------
> >  1 file changed, 30 insertions(+), 12 deletions(-)
> > 
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index fb31de9416..b359efd6f9 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -722,6 +722,15 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, 
> > uint64_t iova, bool is_write,
> >  
> >  typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
> >  
> > +static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
> > +                             vtd_page_walk_hook hook_fn, void *private)
> I find the function  name a bit weird as it does not does a ptw but
> rather call a callback on an entry. vtd_callback_wrapper?

It's a hook for the page walk process, and IMHO vtd_callback_wrapper
does not really provide any hint for the page walking.  So even if you
prefer the "callback_wrapper" naming I would still more prefer:

  vtd_page_walk_callback[_wrapper]

though if so I'd say I don't see much benefit comparing to use the old
vtd_page_walk_hook, which seems fine to me too...

> > +{
> > +    assert(hook_fn);
> > +    trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,
> > +                            entry->addr_mask, entry->perm);
> > +    return hook_fn(entry, private);
> > +}
> > +
> >  /**
> >   * vtd_page_walk_level - walk over specific level for IOVA range
> >   *
> > @@ -781,28 +790,37 @@ static int vtd_page_walk_level(dma_addr_t addr, 
> > uint64_t start,
> >           */
> >          entry_valid = read_cur | write_cur;
> >  
> > +        entry.target_as = &address_space_memory;
> > +        entry.iova = iova & subpage_mask;
> > +        entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
> > +        entry.addr_mask = ~subpage_mask;
> > +
> >          if (vtd_is_last_slpte(slpte, level)) {
> > -            entry.target_as = &address_space_memory;
> > -            entry.iova = iova & subpage_mask;
> >              /* NOTE: this is only meaningful if entry_valid == true */
> >              entry.translated_addr = vtd_get_slpte_addr(slpte, aw);
> > -            entry.addr_mask = ~subpage_mask;
> > -            entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
> >              if (!entry_valid && !notify_unmap) {
> >                  trace_vtd_page_walk_skip_perm(iova, iova_next);
> >                  goto next;
> >              }
> > -            trace_vtd_page_walk_one(level, entry.iova, 
> > entry.translated_addr,
> > -                                    entry.addr_mask, entry.perm);
> > -            if (hook_fn) {
> > -                ret = hook_fn(&entry, private);
> > -                if (ret < 0) {
> > -                    return ret;
> > -                }
> > +            ret = vtd_page_walk_one(&entry, level, hook_fn, private);
> > +            if (ret < 0) {
> > +                return ret;
> >              }
> >          } else {
> >              if (!entry_valid) {
> > -                trace_vtd_page_walk_skip_perm(iova, iova_next);
> > +                if (notify_unmap) {
> > +                    /*
> > +                     * The whole entry is invalid; unmap it all.
> > +                     * Translated address is meaningless, zero it.
> > +                     */
> > +                    entry.translated_addr = 0x0;
> do you really need to zero the translated_addr and the related comment.
> As soon as perm is NONE this should not be used?

Yes here we can avoid setting it.  However that'll make sure we don't
leak strange numbers to the below notifiers, so I would still slightly
prefer to zero it here.

> > +                    ret = vtd_page_walk_one(&entry, level, hook_fn, 
> > private);
> > +                    if (ret < 0) {
> > +                        return ret;
> > +                    }
> > +                } else {
> > +                    trace_vtd_page_walk_skip_perm(iova, iova_next);
> > +                }
> >                  goto next;
> >              }
> >              ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova,
> > 
> 
> Besides
> Reviewed-by: Eric Auger <address@hidden>

Thanks for reviewing the series.

Note that this v2 is obsolete, please feel free to read version 3 of
this series, which contains quite a lot of functional changes.

Regards,

-- 
Peter Xu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]