[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API f
From: |
Liu, Yi L |
Subject: |
Re: [Qemu-devel] [RFC PATCH 1/8] iommu: Introduce bind_pasid_table API function |
Date: |
Tue, 23 May 2017 15:50:49 +0800 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Fri, Apr 28, 2017 at 01:51:42PM +0100, Jean-Philippe Brucker wrote:
> On 28/04/17 10:04, Liu, Yi L wrote:
Hi Jean,
Sorry for the delay response. Still have some follow-up comments on
per-device or per-group. Pls refer to comments inline.
> > On Wed, Apr 26, 2017 at 05:56:45PM +0100, Jean-Philippe Brucker wrote:
> >> Hi Yi, Jacob,
> >>
> >> On 26/04/17 11:11, Liu, Yi L wrote:
> >>> From: Jacob Pan <address@hidden>
> >>>
> >>> Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use
> >>> case in the guest:
> >>> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html
> >>>
> >>> As part of the proposed architecture, when a SVM capable PCI
> >>> device is assigned to a guest, nested mode is turned on. Guest owns the
> >>> first level page tables (request with PASID) and performs GVA->GPA
> >>> translation. Second level page tables are owned by the host for GPA->HPA
> >>> translation for both request with and without PASID.
> >>>
> >>> A new IOMMU driver interface is therefore needed to perform tasks as
> >>> follows:
> >>> * Enable nested translation and appropriate translation type
> >>> * Assign guest PASID table pointer (in GPA) and size to host IOMMU
> >>>
> >>> This patch introduces new functions called iommu_(un)bind_pasid_table()
> >>> to IOMMU APIs. Architecture specific IOMMU function can be added later
> >>> to perform the specific steps for binding pasid table of assigned devices.
> >>>
> >>> This patch also adds model definition in iommu.h. It would be used to
> >>> check if the bind request is from a compatible entity. e.g. a bind
> >>> request from an intel_iommu emulator may not be supported by an ARM SMMU
> >>> driver.
> >>>
> >>> Signed-off-by: Jacob Pan <address@hidden>
> >>> Signed-off-by: Liu, Yi L <address@hidden>
> >>> ---
> >>> drivers/iommu/iommu.c | 19 +++++++++++++++++++
> >>> include/linux/iommu.h | 31 +++++++++++++++++++++++++++++++
> >>> 2 files changed, 50 insertions(+)
> >>>
> >>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> >>> index dbe7f65..f2da636 100644
> >>> --- a/drivers/iommu/iommu.c
> >>> +++ b/drivers/iommu/iommu.c
> >>> @@ -1134,6 +1134,25 @@ int iommu_attach_device(struct iommu_domain
> >>> *domain, struct device *dev)
> >>> }
> >>> EXPORT_SYMBOL_GPL(iommu_attach_device);
> >>>
> >>> +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device
> >>> *dev,
> >>> + struct pasid_table_info *pasidt_binfo)
> >>
> >> I guess that domain can always be deduced from dev using
> >> iommu_get_domain_for_dev, and doesn't need to be passed as argument?
> >>
> >> For the next version of my SVM series, I was thinking of passing group
> >> instead of device to iommu_bind. Since all devices in a group are expected
> >> to share the same mappings (whether they want it or not), users will have
> >
> > Virtual address space is not tied to protection domain as I/O virtual
> > address
> > space does. Is it really necessary to affect all the devices in this group.
> > Or it is just for consistence?
>
> It's mostly about consistency, and also avoid hiding implicit behavior in
> the IOMMU driver. I have the following example, described using group and
> domain structures from the IOMMU API:
> ____________________
> |IOMMU ____________ |
> | |DOM ______ ||
> | | |GRP ||| bind
> | | | A<-----------------Task 1
> | | | B |||
> | | |______|||
> | | ______ ||
> | | |GRP |||
> | | | C |||
> | | |______|||
> | |____________||
> | ____________ |
> | |DOM ______ ||
> | | |GRP |||
> | | | D |||
> | | |______|||
> | |____________||
> |____________________|
>
> Let's take PCI functions A, B, C, and D, all with PASID capabilities. Due
> to some hardware limitation (in the bus, the device or the IOMMU), B can
> see all DMA transactions issued by A. A and B are therefore in the same
> IOMMU group. C and D can be isolated by the IOMMU, so they each have their
> own group.
>
> (As far as I know, in the SVM world at the moment, devices are neatly
> integrated and there is no need for putting multiple devices in the same
> IOMMU group, but I don't think we should expect all future SVM systems to
> be well-behaved.)
>
> So when a user binds Task 1 to device A, it is *implicitly* giving device
> B access to Task 1 as well. Simply because the IOMMU is unable to isolate
> A from B, PASID or not. B could access the same address space as A, even
> if you don't call bind again to explicitly attach the PASID table to B.
>
> If the bind is done with device as argument, maybe users will believe that
> using PASIDs provides an additional level of isolation within a group,
> when it really doesn't. That's why I'm inclined to have the whole bind API
> be on groups rather than devices, if only for clarity.
This may depend on how the user understand the isolation. I think different
PASID does mean different address space. From this perspective, it does look
like isolation.
> But I don't know, maybe a comment explaining the above would be sufficient.
>
> To be frank my comment about group versus device is partly to make sure
> that I grasp the various concepts correctly and that we're on the same
> page. Doing the bind on groups is less significant in your case, for PASID
> table binding, because VFIO already takes care of IOMMU group properly. In
> my case I expect DRM, network, DMA drivers to use the API as well for
> binding tasks, and I don't want to introduce ambiguity in the API that
> would lead to security holes later.
For this part, would you provide more detail about why it would be more
significant to bind on group level in your case? I think we need strong
reason to support it. Currently, the other map_page APIs are passing
device as argument. Would it also be recommended to use group as argument?
Thanks,
Yi L
> >> to do iommu_group_for_each_dev anyway (as you do in patch 6/8). So it
> >> might be simpler to let the IOMMU core take the group lock and do
> >> group->domain->ops->bind_task(dev...) for each device. The question also
> >> holds for iommu_do_invalidate in patch 3/8.
> >
> > In my understanding, it is moving the for_each_dev loop into iommu driver?
> > Is it?
>
> Yes, that's what I meant
>
> >> This way the prototypes would be:
> >> int iommu_bind...(struct iommu_group *group, struct ... *info)
> >> int iommu_unbind...(struct iommu_group *group, struct ...*info)
> >> int iommu_invalidate...(struct iommu_group *group, struct ...*info)
> >
> > For PASID table binding from guest, I think it'd better to be per-device op
> > since the bind operation wants to modify the host context entry. But we may
> > still share the API and do things differently in iommu driver.
>
> Sure, as said above the use cases for PASID table and single PASID binding
> are different, sharing the API is not strictly necessary.
>
> > For invalidation, I think it'd better to be per-group. Actually, with guest
> > IOMMU exists, there is only one group in a domain on Intel platform. Do it
> > for
> > each device is not expected. How about it on ARM?
>
> In ARM systems with the DMA API (IOMMU_DOMAIN_DMA), there is one group per
> domain. But with VFIO (IOMMU_DOMAIN_UNMANAGED), VFIO will try to attach
> multiple groups in the same container to the same domain when possible.
>
> >> For PASID table binding it might not matter much, as VFIO will most likely
> >> be the only user. But task binding will be called by device drivers, which
> >> by now should be encouraged to do things at iommu_group granularity.
> >> Alternatively it could be done implicitly like in iommu_attach_device,
> >> with "iommu_bind_device_x" calling "iommu_bind_group_x".
> >
> > Do you mean the bind task from userspace driver? I guess you're trying to do
> > different types of binding request in a single svm_bind API?
> >
> >>
> >> Extending this reasoning, since groups in a domain are also supposed to
> >> have the same mappings, then similarly to map/unmap,
> >> bind/unbind/invalidate should really be done with an iommu_domain (and
> >> nothing else) as target argument. However this requires the IOMMU core to
> >> keep a group list in each domain, which might complicate things a little
> >> too much.
> >>
> >> But "all devices in a domain share the same PASID table" is the paradigm
> >> I'm currently using in the guts of arm-smmu-v3. And I wonder if, as with
> >> iommu_group, it should be made more explicit to users, so they don't
> >> assume that devices within a domain are isolated from each others with
> >> regard to PASID DMA.
> >
> > Is the isolation you mentioned means forbidding to do PASID DMA to the same
> > virtual address space when the device comes from different domain?
>
> In the above example, devices A, B and C are in the same IOMMU domain
> (because, for instance, user put the two groups in the same VFIO
> container.) Then in the SMMUv3 driver they would all share the same PASID
> table. A, B and C can access Task 1 with the PASID obtained during the
> depicted bind. They don't need to call bind again for device C, though it
> would be good practice.
>
> But D is in a different domain, so unless you also call bind on Task 1 for
> device D, there is no way that D can access Task 1.
>
> Thanks,
> Jean
>