qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RFC 1/6] i386/pc: Account IOVA reserved ranges above 4G bound


From: Igor Mammedov
Subject: Re: [PATCH RFC 1/6] i386/pc: Account IOVA reserved ranges above 4G boundary
Date: Mon, 28 Jun 2021 15:25:50 +0200

On Wed, 23 Jun 2021 14:07:29 +0100
Joao Martins <joao.m.martins@oracle.com> wrote:

> On 6/23/21 1:09 PM, Igor Mammedov wrote:
> > On Wed, 23 Jun 2021 10:51:59 +0100
> > Joao Martins <joao.m.martins@oracle.com> wrote:
> >   
> >> On 6/23/21 10:03 AM, Igor Mammedov wrote:  
> >>> On Tue, 22 Jun 2021 16:49:00 +0100
> >>> Joao Martins <joao.m.martins@oracle.com> wrote:
> >>>     
> >>>> It is assumed that the whole GPA space is available to be
> >>>> DMA addressable, within a given address space limit. Since
> >>>> v5.4 based that is not true, and VFIO will validate whether
> >>>> the selected IOVA is indeed valid i.e. not reserved by IOMMU
> >>>> on behalf of some specific devices or platform-defined.
> >>>>
> >>>> AMD systems with an IOMMU are examples of such platforms and
> >>>> particularly may export only these ranges as allowed:
> >>>>
> >>>>  0000000000000000 - 00000000fedfffff (0      .. 3.982G)
> >>>>  00000000fef00000 - 000000fcffffffff (3.983G .. 1011.9G)
> >>>>  0000010000000000 - ffffffffffffffff (1Tb    .. 16Pb)
> >>>>
> >>>> We already know of accounting for the 4G hole, albeit if the
> >>>> guest is big enough we will fail to allocate a >1010G given
> >>>> the ~12G hole at the 1Tb boundary, reserved for HyperTransport.
> >>>>
> >>>> When creating the region above 4G, take into account what
> >>>> IOVAs are allowed by defining the known allowed ranges
> >>>> and search for the next free IOVA ranges. When finding a
> >>>> invalid IOVA we mark them as reserved and proceed to the
> >>>> next allowed IOVA region.
> >>>>
> >>>> After accounting for the 1Tb hole on AMD hosts, mtree should
> >>>> look like:
> >>>>
> >>>> 0000000100000000-000000fcffffffff (prio 0, i/o):
> >>>>  alias ram-above-4g @pc.ram 0000000080000000-000000fc7fffffff
> >>>> 0000010000000000-000001037fffffff (prio 0, i/o):
> >>>>  alias ram-above-1t @pc.ram 000000fc80000000-000000ffffffffff    
> >>>
> >>> You are talking here about GPA which is guest specific thing
> >>> and then somehow it becomes tied to host. For bystanders it's
> >>> not clear from above commit message how both are related.
> >>> I'd add here an explicit explanation how AMD host is related GPAs
> >>> and clarify where you are talking about guest/host side.
> >>>     
> >> OK, makes sense.
> >>
> >> Perhaps using IOVA makes it easier to understand. I said GPA because
> >> there's an 1:1 mapping between GPA and IOVA (if you're not using vIOMMU).  
> > 
> > IOVA may be a too broad term, maybe explain it in terms of GPA and HPA
> > and why it does matter on each side (host/guest)
> >   
> 
> I used the term IOVA specially because that is applicable to Host IOVA or
> Guest IOVA (same rules apply as this is not special cased for VMs). So,
> regardless of whether we have guest mode page tables, or just host
> iommu page tables, this address range should be reserved and not used.

IOVA doesn't make it any clearer, on contrary it's more confusing.

And does host's HPA matter at all? (if host's firmware isn't broken,
it should never use nor advertise 1Tb hole). So we probably talking
here only about GPA only.
   
> >>> also what about usecases:
> >>>  * start QEMU with Intel cpu model on AMD host with intel's iommu    
> >>
> >> In principle it would be less likely to occur. But you would still need
> >> to mark the same range as reserved. The limitation is on DMA occuring
> >> on those IOVAs (host or guest) coinciding with that range, so you would
> >> want to inform the guest that at least those should be avoided.
> >>  
> >>>  * start QEMU with AMD cpu model and AMD's iommu on Intel host    
> >>
> >> Here you would probably only mark the range, solely for honoring how 
> >> hardware
> >> is usually represented. But really, on Intel, nothing stops you from 
> >> exposing the
> >> aforementioned range as RAM.
> >>  
> >>>  * start QEMU in TCG mode on AMD host (mostly form qtest point ot view)
> >>>     
> >> This one is tricky. Because you can hotplug a VFIO device later on,
> >> I opted for always marking the reserved range. If you don't use VFIO 
> >> you're good, but
> >> otherwise you would still need reserved. But I am not sure how qtest is 
> >> used
> >> today for testing huge guests.  
> > I do not know if there are VFIO tests in qtest (probably nope, since that
> > could require a host configured for that), but we can add a test
> > for his memory quirk (assuming phys-bits won't get in the way)
> >   
> 
>       Joao
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]