qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB inval


From: Jean-Philippe Brucker
Subject: Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
Date: Mon, 3 Jul 2017 12:52:52 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1

Hi Yi,

On 02/07/17 11:06, Liu, Yi L wrote:
> On Fri, May 12, 2017 at 01:11:02PM +0100, Jean-Philippe Brucker wrote:
> 
> Hi Jean,
> 
> As we've got a few discussions on it. I'd like to have a conclusion and
> make it as a reference for future discussion.
> 
> Currently, we are inclined to have a hybrid format for the iommu tlb
> invalidation from userspace(vIOMMU or userspace driver).
> 
> Based on the previous discussion, may the below work?
> 
> 1. Add a IOCTL for iommu tlb invalidation.
> 
> VFIO_IOMMU_TLB_INVALIDATE
> 
> struct vfio_iommu_tlb_invalidate {
>    __u32   argsz;
>    __u32   length;

Wouldn't argsz be exactly length + 8? Might be redundant in this case.

>    __u8    data[];
> };
> 
> comments from Alex William: is it more suitable to add a new flag bit on
> vfio_device_svm(a structure defined in patch 5 of this patchset), the data
> structure is so similar.
> 
> Personally, I'm ok with it. Pls let me know your thoughts. However, the
> precondition is we accept the whole definition in this email. If not, the
> vfio_iommu_tlb_invalidate would be defined differently.

With this proposal sharing the structure makes sense. As I understand it
we're keeping the VFIO_IOMMU_TLB_INVALIDATE ioctl? In which case adding a
flag bit would be redundant.

> 2. Define a structure in include/uapi/linux/iommu.h(newly added header file)
> 
> struct iommu_tlb_invalidate {
>       __u32   scope;
> /* pasid-selective invalidation described by @pasid */
> #define IOMMU_INVALIDATE_PASID        (1 << 0)
> /* address-selevtive invalidation described by (@vaddr, @size) */
> #define IOMMU_INVALIDATE_VADDR        (1 << 1)
>       __u32   flags;
> /*  targets non-pasid mappings, @pasid is not valid */
> #define IOMMU_INVALIDATE_NO_PASID     (1 << 0)

Although it was my proposal, I don't like this flag. In ARM SMMU, we're
using a special mode where PASID 0 is reserved and any traffic without
PASID uses entry 0 of the PASID table. So I proposed the "NO_PASID" flag
to invalidate that special context explicitly. But this means that
invalidation packet targeted at that context will have "scope = PASID" and
"flags = NO_PASID", which is utterly confusing.

I now think that we should get rid of the IOMMU_INVALIDATE_NO_PASID flag
and just use PASID 0 to invalidate this context on ARM. I don't think
other architectures would use the NO_PASID flag anyway, but might be mistaken.

> /* indicating that the pIOMMU doesn't need to invalidate
>    all intermediate tables cached as part of the PTE for
>    vaddr, only the last-level entry (pte). This is a hint. */
> #define IOMMU_INVALIDATE_VADDR_LEAF   (1 << 1)
>       __u32   pasid;
>       __u64   vaddr;
>       __u64   size;
>       __u8    data[];
> };
> 
> For this part, the scope and flags are basically aligned with your previous
> email. I renamed the prefix to be "IOMMU_". In my opinion, the scope and flags
> would be filled by vIOMMU emulator and be parsed by underlying iommu driver,
> it is much more suitable to be defined in a uapi header file.

I tend to agree, defining a single structure in a new IOMMU UAPI file is
better than having identical structures both in uapi/linux/vfio.h and
linux/iommu.h. This way we avoid VFIO having to copy the same structure
field by field. Arch-specific structures that go in
iommu_tlb_invalidate.data also ought to be defined in uapi/linux/iommu.h

> Besides the reason above, I don't want VFIO engae too much on the data 
> parsing.
> If we move the scope,flags,pasid,vaddr,size fields to 
> vfio_iommu_tlb_invalidate,
> then both kernel space vfio and user space vfio needs to do much parsing. So I
> may prefer the way above.

Would the entire validation of struct iommu_tlb_invalidate be offloaded to
the IOMMU subsystem? Checking the structure sizes, copying from user, and
validating the flags?

I guess it's mostly an implementation detail, but it might be better to
keep this code in VFIO as well, even for the validation of
iommu_tlb_invalidate.data (which would require VFIO to keep track of the
model used during bind). This way VFIO sanitizes what comes from
userspace, whilst the IOMMU subsystem only deals with valid kernel
structures, and another subsystem could easily reuse the
iommu_tlb_invalidate API.

The IOMMU subsystem would still validate the meaning of the fields, for
instance whether a given combination of flag is allowed or if the PASID
exists.

Thanks,
Jean

> If you've got any other idea, pls feel free to post it. It's welcomed.
> 
> Thanks,
> Yi L
> 
>> Hi Yi,
>>
>> On 26/04/17 11:12, Liu, Yi L wrote:
>>> From: "Liu, Yi L" <address@hidden>
>>>
>>> This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB
>>> invalidate request from guest to host.
>>>
>>> In the case of SVM virtualization on VT-d, host IOMMU driver has
>>> no knowledge of caching structure updates unless the guest
>>> invalidation activities are passed down to the host. So a new
>>> IOCTL is needed to propagate the guest cache invalidation through
>>> VFIO.
>>>
>>> Signed-off-by: Liu, Yi L <address@hidden>
>>> ---
>>>  include/uapi/linux/vfio.h | 9 +++++++++
>>>  1 file changed, 9 insertions(+)
>>>
>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>> index 6b97987..50c51f8 100644
>>> --- a/include/uapi/linux/vfio.h
>>> +++ b/include/uapi/linux/vfio.h
>>> @@ -564,6 +564,15 @@ struct vfio_device_svm {
>>>  
>>>  #define VFIO_IOMMU_SVM_BIND_TASK   _IO(VFIO_TYPE, VFIO_BASE + 22)
>>>  
>>> +/* For IOMMU TLB Invalidation Propagation */
>>> +struct vfio_iommu_tlb_invalidate {
>>> +   __u32   argsz;
>>> +   __u32   length;
>>> +   __u8    data[];
>>> +};
>>
>> We initially discussed something a little more generic than this, with
>> most info explicitly described and only pIOMMU-specific quirks and hints
>> in an opaque structure. Out of curiosity, why the change? I'm not against
>> a fully opaque structure, but there seem to be a large overlap between TLB
>> invalidations across architectures.
>>
>>
>> For what it's worth, when prototyping the paravirtualized IOMMU I came up
>> with the following.
>>
>> (From the paravirtualized POV, the SMMU also has to swizzle endianess
>> after unpacking an opaque structure, since userspace doesn't know what's
>> in it and guest might use a different endianess. So we need to force all
>> opaque data to be e.g. little-endian.)
>>
>> struct vfio_iommu_tlb_invalidate {
>>      __u32   argsz;
>>      __u32   scope;
>>      __u32   flags;
>>      __u32   pasid;
>>      __u64   vaddr;
>>      __u64   size;
>>      __u8    data[];
>> };
>>
>> Scope is a bitfields restricting the invalidation scope. By default
>> invalidate the whole container (all PASIDs and all VAs). @pasid, @vaddr
>> and @size are unused.
>>
>> Adding VFIO_IOMMU_INVALIDATE_PASID (1 << 0) restricts the invalidation
>> scope to the pasid described by @pasid.
>> Adding VFIO_IOMMU_INVALIDATE_VADDR (1 << 1) restricts the invalidation
>> scope to the address range described by (@vaddr, @size).
>>
>> So setting scope = VFIO_IOMMU_INVALIDATE_VADDR would invalidate the VA
>> range for *all* pasids (as well as no_pasid). Setting scope =
>> (VFIO_IOMMU_INVALIDATE_VADDR|VFIO_IOMMU_INVALIDATE_PASID) would invalidate
>> the VA range only for @pasid.
>>
>> Flags depend on the selected scope:
>>
>> VFIO_IOMMU_INVALIDATE_NO_PASID, indicating that invalidation (either
>> without scope or with INVALIDATE_VADDR) targets non-pasid mappings
>> exclusively (some architectures, e.g. SMMU, allow this)>>
>> VFIO_IOMMU_INVALIDATE_VADDR_LEAF, indicating that the pIOMMU doesn't need
>> to invalidate all intermediate tables cached as part of the PTW for vaddr,
>> only the last-level entry (pte). This is a hint.
>>
>> I guess what's missing for Intel IOMMU and would go in @data is the
>> "global" hint (which we don't have in SMMU invalidations). Do you see
>> anything else, that the pIOMMU cannot deduce from this structure?
>>
>> Thanks,
>> Jean
>>
>>
>>> +#define VFIO_IOMMU_TLB_INVALIDATE  _IO(VFIO_TYPE, VFIO_BASE + 23)
>>> +
>>>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
>>>  
>>>  /*
>>>
>>
>>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]