qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RFC 1/6] i386/pc: Account IOVA reserved ranges above 4G bound


From: Joao Martins
Subject: Re: [PATCH RFC 1/6] i386/pc: Account IOVA reserved ranges above 4G boundary
Date: Wed, 23 Jun 2021 14:04:19 +0100


On 6/23/21 12:39 PM, Igor Mammedov wrote:
> On Wed, 23 Jun 2021 10:37:38 +0100
> Joao Martins <joao.m.martins@oracle.com> wrote:
> 
>> On 6/23/21 8:11 AM, Igor Mammedov wrote:
>>> On Tue, 22 Jun 2021 16:49:00 +0100
>>> Joao Martins <joao.m.martins@oracle.com> wrote:
>>>   
>>>> It is assumed that the whole GPA space is available to be
>>>> DMA addressable, within a given address space limit. Since
>>>> v5.4 based that is not true, and VFIO will validate whether
>>>> the selected IOVA is indeed valid i.e. not reserved by IOMMU
>>>> on behalf of some specific devices or platform-defined.
>>>>
>>>> AMD systems with an IOMMU are examples of such platforms and
>>>> particularly may export only these ranges as allowed:
>>>>
>>>>    0000000000000000 - 00000000fedfffff (0      .. 3.982G)
>>>>    00000000fef00000 - 000000fcffffffff (3.983G .. 1011.9G)
>>>>    0000010000000000 - ffffffffffffffff (1Tb    .. 16Pb)
>>>>
>>>> We already know of accounting for the 4G hole, albeit if the
>>>> guest is big enough we will fail to allocate a >1010G given
>>>> the ~12G hole at the 1Tb boundary, reserved for HyperTransport.
>>>>
>>>> When creating the region above 4G, take into account what
>>>> IOVAs are allowed by defining the known allowed ranges
>>>> and search for the next free IOVA ranges. When finding a
>>>> invalid IOVA we mark them as reserved and proceed to the
>>>> next allowed IOVA region.
>>>>
>>>> After accounting for the 1Tb hole on AMD hosts, mtree should
>>>> look like:
>>>>
>>>> 0000000100000000-000000fcffffffff (prio 0, i/o):
>>>>    alias ram-above-4g @pc.ram 0000000080000000-000000fc7fffffff
>>>> 0000010000000000-000001037fffffff (prio 0, i/o):
>>>>    alias ram-above-1t @pc.ram 000000fc80000000-000000ffffffffff  
>>>
>>> why not push whole ram-above-4g above 1Tb mark
>>> when RAM is sufficiently large (regardless of used host),
>>> instead of creating yet another hole and all complexity it brings along?
>>>   
>>
>> There's the problem with CMOS which describes memory above 4G, part of the
>> reason I cap it to the 1TB minus the reserved range i.e. for AMD, CMOS would
>> only describe up to 1T.
>>
>> But should we not care about that then it's an option, I suppose.
> we probably do not care about CMOS with so large RAM,
> as long as QEMU generates correct E820 (cmos mattered only with old Seabios
> which used it for generating memory map)
> 
OK, good to know.

Any extension on CMOS would probably also be out of spec.

>> We would waste 1Tb of address space because of 12G, and btw the logic here
>> is not so different than the 4G hole, in fact could probably share this
>> with it.
> the main reason I'm looking for alternative, is complexity
> of making hole brings in. At this point, we can't do anything
> with 4G hole as it's already there, but we can try to avoid that
> for high RAM and keep rules there simple as it's now.
> 
Right. But for what is worth, that complexity is spread in two parts:

1) dealing with a sparse RAM model (with more than one hole)

2) offsetting everything else that assumes a linear RAM map.

I don't think that even if we shift start of RAM to after 1TB boundary that
we would get away top solving item 2 -- which personally is where I find this
a tad bit more hairy. So it would probably make this patch complexity smaller, 
but
not vary much in how spread the changes get.

> Also partitioning/splitting main RAM is one of the things that
> gets in the way converting it to PC-DIMMMs model.
> 
Can you expand on that? (a link to a series is enough)

> Loosing 1Tb of address space might be acceptable on a host
> that can handle such amounts of RAM
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]