qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v7 3/4] vfio iommu: Add support for mediated dev


From: Jike Song
Subject: Re: [Qemu-devel] [PATCH v7 3/4] vfio iommu: Add support for mediated devices
Date: Sat, 08 Oct 2016 15:09:15 +0800
User-agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8

On 09/30/2016 07:44 PM, Kirti Wankhede wrote:
> On 9/30/2016 8:40 AM, Jike Song wrote:
>> On 09/30/2016 10:58 AM, Jike Song wrote:
>>> On 09/29/2016 11:06 PM, Kirti Wankhede wrote:
>>>>
>>>>
>>>> On 9/29/2016 7:47 AM, Jike Song wrote:
>>>>> +Guangrong
>>>>>
>>>>> On 08/25/2016 11:53 AM, Kirti Wankhede wrote:
>>>>
>>>> ...
>>>>
>>>>>> +static long vfio_iommu_type1_pin_pages(void *iommu_data,
>>>>>> +                                       unsigned long *user_pfn,
>>>>>> +                                       long npage, int prot,
>>>>>> +                                       unsigned long *phys_pfn)
>>>>>> +{
>>>>>> +        struct vfio_iommu *iommu = iommu_data;
>>>>>> +        struct vfio_domain *domain;
>>>>>> +        int i, j, ret;
>>>>>> +        long retpage;
>>>>>> +        unsigned long remote_vaddr;
>>>>>> +        unsigned long *pfn = phys_pfn;
>>>>>> +        struct vfio_dma *dma;
>>>>>> +        bool do_accounting = false;
>>>>>> +
>>>>>> +        if (!iommu || !user_pfn || !phys_pfn)
>>>>>> +                return -EINVAL;
>>>>>> +
>>>>>> +        mutex_lock(&iommu->lock);
>>>>>> +
>>>>>> +        if (!iommu->local_domain) {
>>>>>> +                ret = -EINVAL;
>>>>>> +                goto pin_done;
>>>>>> +        }
>>>>>> +
>>>>>> +        domain = iommu->local_domain;
>>>>>> +
>>>>>> +        /*
>>>>>> +         * If iommu capable domain exist in the container then all 
>>>>>> pages are
>>>>>> +         * already pinned and accounted. Accouting should be done if 
>>>>>> there is no
>>>>>> +         * iommu capable domain in the container.
>>>>>> +         */
>>>>>> +        do_accounting = !IS_IOMMU_CAPABLE_DOMAIN_IN_CONTAINER(iommu);
>>>>>> +
>>>>>> +        for (i = 0; i < npage; i++) {
>>>>>> +                struct vfio_pfn *p;
>>>>>> +                dma_addr_t iova;
>>>>>> +
>>>>>> +                iova = user_pfn[i] << PAGE_SHIFT;
>>>>>> +
>>>>>> +                dma = vfio_find_dma(iommu, iova, 0);
>>>>>> +                if (!dma) {
>>>>>> +                        ret = -EINVAL;
>>>>>> +                        goto pin_unwind;
>>>>>> +                }
>>>>>> +
>>>>>> +                remote_vaddr = dma->vaddr + iova - dma->iova;
>>>>>> +
>>>>>> +                retpage = __vfio_pin_pages_local(domain, remote_vaddr, 
>>>>>> prot,
>>>>>> +                                                 &pfn[i], 
>>>>>> do_accounting);
>>>>>
>>>>> Hi Kirti,
>>>>>
>>>>> Here you call __vfio_pin_pages_local() > vaddr_get_pfn() > GUP regardless
>>>>> whether the vaddr already pinned or not. That probably means, if the 
>>>>> caller 
>>>>> calls vfio_pin_pages() with a GPA for multiple times, you get memory 
>>>>> leaks.
>>>>>
>>>>> GUP always increases the page refcnt.
>>>>>
>>>>> FWIW, I would like to have the pfn_list_lock implemented with key == iova,
>>>>> so you can always try to find the PFN for a given iova, and pin it only if
>>>>> not found.
>>>>>
>>>>
>>>> I didn't get how there would be a memory leak.
>>>>
>>>> Right, GUP increases refcnt, so if vfio_pin_pages() is called for
>>>> multiple types for same GPA, refcnt would be incremented. In
>>>> vfio_iommu_type1_pin_pages() pinned pages list is maintained with
>>>> ref_count. If pfn is already in list, ref_count is incremented and same
>>>> is used while unpining pages.
>>>>
>>>
>>> Let's have a close look at vfio_unpin_pfn:
>>>
>>>     static int vfio_unpin_pfn(struct vfio_domain *domain,
>>>                               struct vfio_pfn *vpfn, bool do_accounting)
>>>     {
>>>             __vfio_unpin_pages_for_mdev(domain, vpfn->pfn, vpfn->prot,
>>>                                         do_accounting);
>>>
>>>             if (atomic_dec_and_test(&vpfn->ref_count))
>>>                     vfio_remove_from_pfn_list(domain, vpfn);
>>>
>>>             return 1;
>>>     }
>>>
>>> Here you didn't call __vfio_unpin_pages_for_mdev -- thereby put_page -- for
>>> vpfn->ref_count times. If page->_refcount increased by GUP for (N) times, 
>>> here
>>> you only set it back to (N-1).
>>>
> 
> User of vfio_pin_pages() should call vfio_unpin_pages() also,  so here
> we unpin it once. If vfio_pin_pages() is called twice for same page, we
> should get vfio_unpin_pages() twice for same page.
>

If this is the deliberate design, why do you need a 'ref_count'? You can
simply drop the 'ref_count' and blame the caller for pinning/unpinning
different times.

> If users of these APIs don't follow this, then
> vfio_release_domain() -> vfio_local_unpin_all() takes care of unpin,
> decrement ref_count and delete node on (ref_count == 0) for all
> remaining pfn.
>

Here you did pay attention to the "caller doesn't follow this" situation.
However, dealing with 'ref_count' in vfio-iommu is not enough: memory
leaked.

>>
>> What's more, since all pinned {iova, pfni} already saved, it's better to
>> consult it before calling GUP, which will get_page() unconditionally.
>
> pfn is required to unpin page, so we have pfn as key for rbtree.
> vfio_pin_pages() is called with user_pfn or iova, which can't be used to
> search in rbtree with iova in optimized way. Raw way would be to goto
> each node of rbtree and check iova which would hamper the performance in
> if this is called in performance critical path.
> So here optimized way is to first pin it, get pfn and check if already
> exist in rbtree. If it exist increment ref_count else add it to the rbtree.
> 

Of course pfn is required to unpin page, I 100% agree. But that doesn't 
change the argue: using iova instead for key, you can still store pfn
along with it.

By the way, calling GUP unconditionally hurts more than searching rbtree.


--
Thanks,
Jike




reply via email to

[Prev in Thread] Current Thread [Next in Thread]