qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v12 10/22] vfio iommu type1: Add support for med


From: Kirti Wankhede
Subject: Re: [Qemu-devel] [PATCH v12 10/22] vfio iommu type1: Add support for mediated devices
Date: Tue, 15 Nov 2016 20:39:25 +0530


On 11/15/2016 4:55 AM, Alex Williamson wrote:
> On Mon, 14 Nov 2016 21:12:24 +0530
> Kirti Wankhede <address@hidden> wrote:
> 
>> VFIO IOMMU drivers are designed for the devices which are IOMMU capable.
>> Mediated device only uses IOMMU APIs, the underlying hardware can be
>> managed by an IOMMU domain.
>>
>> Aim of this change is:
>> - To use most of the code of TYPE1 IOMMU driver for mediated devices
>> - To support direct assigned device and mediated device in single module
>>
>> This change adds pin and unpin support for mediated device to TYPE1 IOMMU
>> backend module. More details:
>> - Domain for external user is tracked separately in vfio_iommu structure.
>>   It is allocated when group for first mdev device is attached.
>> - Pages pinned for external domain are tracked in each vfio_dma structure
>>   for that iova range.
>> - Page tracking rb-tree in vfio_dma keeps <iova, pfn, ref_count>. Key of
>>   rb-tree is iova, but it actually aims to track pfns.
>> - On external pin request for an iova, page is pinned only once, if iova
>>   is already pinned and tracked, ref_count is incremented.
> 
> This is referring only to the external (ie. pfn_list) tracking only,
> correct?  In general, a page can be pinned up to twice per iova
> referencing it, once for an iommu mapped domain and again in the
> pfn_list, right?
> 

That's right.

...
                }
>>  
>> -            if (!rsvd && !lock_cap && mm->locked_vm + i + 1 > limit) {
>> -                    put_pfn(pfn, prot);
>> +            iova = vaddr - dma->vaddr + dma->iova;
> 
> 
> nit, this could be incremented in the for loop just like vaddr, we
> don't need to create it from scratch (iova += PAGE_SIZE).
> 

Ok.

> 
>> +            vpfn = vfio_find_vpfn(dma, iova);
>> +            if (!vpfn)
>> +                    lock_acct++;
>> +
>> +            if (!rsvd && !lock_cap &&
>> +                mm->locked_vm + lock_acct + 1 > limit) {
> 
> 
> lock_acct was incremented just above, why is this lock_acct + 1?  I
> think we're off by one here (ie. remove the +1)?
> 

Moved this vfio_find_vpfn() check after this check.


...


>> -static long vfio_unpin_pages_remote(struct vfio_dma *dma, unsigned long pfn,
>> -                                long npage, int prot, bool do_accounting)
>> +static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
>> +                                unsigned long pfn, long npage,
>> +                                bool do_accounting)
>>  {
>> -    unsigned long unlocked = 0;
>> +    long unlocked = 0, locked = 0;
>>      long i;
>>  
>> -    for (i = 0; i < npage; i++)
>> -            unlocked += put_pfn(pfn++, prot);
>> +    for (i = 0; i < npage; i++) {
>> +            struct vfio_pfn *vpfn;
>> +
>> +            unlocked += put_pfn(pfn++, dma->prot);
>> +
>> +            vpfn = vfio_find_vpfn(dma, iova + (i << PAGE_SHIFT));
>> +            if (vpfn)
>> +                    locked++;
> 
> This isn't taking into account reserved pages (ex. device mmaps).  In
> the pinning path above we skip accounting of reserved pages, put_pfn
> also only increments for non-reserved pages, but vfio_find_vpfn()
> doesn't care, so it's possible that (locked - unlocked) could result in
> a positive value.  Maybe something like:
> 
> if (put_pfn(...)) {
>       unlocked++;
>       if (vfio_find_vpfn(...))
>               locked++;
> }
> 

Updating.

...
>>  
>> -static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>> +static int vfio_iommu_type1_pin_pages(void *iommu_data,
>> +                                  unsigned long *user_pfn,
>> +                                  int npage, int prot,
>> +                                  unsigned long *phys_pfn)
>> +{
>> +    struct vfio_iommu *iommu = iommu_data;
>> +    int i, j, ret;
>> +    unsigned long remote_vaddr;
>> +    unsigned long *pfn = phys_pfn;
> 
> nit, why do we have this variable rather than using phys_pfn directly?
> Maybe before we had unwind support we incremented this rather than
> indexing it?
> 

Updating.

...

>> +            iova = user_pfn[i] << PAGE_SHIFT;
>> +            dma = vfio_find_dma(iommu, iova, 0);
>> +            if (!dma)
>> +                    goto unpin_exit;
>> +            unlocked += vfio_unpin_page_external(dma, iova, do_accounting);
>> +    }
>> +
>> +unpin_exit:
>> +    mutex_unlock(&iommu->lock);
>> +    return unlocked;
> 
> This is not returning what it's supposed to.  unlocked is going to be
> less than or equal to the number of pages unpinned.  We don't need to
> track unlocked, I think we're just tracking where we are in the unpin
> array, whether it was partial or complete.  I think we want:
> 
> return i > npage ? npage : i;
> 
> Or maybe we can make it more obvious if there's an error on the first
> unpin entry:
> 
> return i > npage ? npage : (i > 0 ? i : -EINVAL);
> 

Thanks for pointing that out. Updating.

...
>> +static void vfio_iommu_unmap_unpin_reaccount(struct vfio_iommu *iommu)
>> +{
>> +    struct rb_node *n, *p;
>> +
>> +    n = rb_first(&iommu->dma_list);
>> +    for (; n; n = rb_next(n)) {
>> +            struct vfio_dma *dma;
>> +            long locked = 0, unlocked = 0;
>> +
>> +            dma = rb_entry(n, struct vfio_dma, node);
>> +            unlocked += vfio_unmap_unpin(iommu, dma, false);
>> +            p = rb_first(&dma->pfn_list);
>> +            for (; p; p = rb_next(p))
>> +                    locked++;
> 
> We don't know that these weren't reserved pages.  If the vendor driver
> was pinning peer-to-peer ranges vfio_unmap_unpin() might have returned
> 0 yet we're assuming each is locked RAM, so our accounting can go
> upside down.
> 

Adding if rsvd check here.

>> +            vfio_lock_acct(dma->task, locked - unlocked);
>> +    }
>> +}
>> +
>> +static void vfio_external_unpin_all(struct vfio_iommu *iommu,
>> +                                bool do_accounting)
>> +{
>> +    struct rb_node *n, *p;
>> +
>> +    n = rb_first(&iommu->dma_list);
>> +    for (; n; n = rb_next(n)) {
>> +            struct vfio_dma *dma;
>> +
>> +            dma = rb_entry(n, struct vfio_dma, node);
>> +            while ((p = rb_first(&dma->pfn_list))) {
>> +                    int unlocked;
>> +                    struct vfio_pfn *vpfn = rb_entry(p, struct vfio_pfn,
>> +                                                     node);
>> +
>> +                    unlocked = vfio_iova_put_vfio_pfn(dma, vpfn);
>> +                    if (do_accounting)
>> +                            vfio_lock_acct(dma->task, -unlocked);
> 
> nit, we could batch these further, only updating accounting once per
> vfio_dma, or once entirely.
> 

Ok.

Thanks,
Kirti



reply via email to

[Prev in Thread] Current Thread [Next in Thread]