--- Begin Message ---
Subject: |
Re: [edk2] [RFC 4/4] OvmfPkg: PlatformPei: invert MTRR setup in QemuInitializeRam() |
Date: |
Wed, 10 Jun 2015 15:03:05 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 |
On 06/09/15 04:15, Laszlo Ersek wrote:
> On 06/08/15 23:46, Laszlo Ersek wrote:
>> At the moment we work with a UC default MTRR type, and set three memory
>> ranges to WB:
>> - [0, 640 KB),
>> - [1 MB, LowerMemorySize),
>> - [4 GB, 4 GB + UpperMemorySize).
>>
>> Unfortunately, coverage for the third range can fail with a high
>> likelihood. If the alignment of the base (ie. 4 GB) and the alignment of
>> the size (UpperMemorySize) differ, then MtrrLib creates a series of
>> variable MTRR entries, with power-of-two sized MTRR masks. And, it's
>> really easy to run out of variable MTRR entries, dependent on the
>> alignment difference.
>>
>> This is a problem because a Linux guest will loudly reject any high memory
>> that is not covered my MTRR.
>>
>> So, let's follow the inverse pattern (loosely inspired by SeaBIOS):
>> - flip the MTRR default type to WB,
>> - set [0, 640 KB) to WB -- fixed MTRRs have precedence over the default
>> type and variable MTRRs, so we can't avoid this,
>> - set [640 KB, 1 MB) to UC -- implemented with fixed MTRRs,
>> - set [LowerMemorySize, 4 GB) to UC -- should succeed with variable MTRRs
>> more likely than the other scheme (due to less chaotic alignment
>> differences).
>>
>> Effects of this patch can be observed by setting DEBUG_CACHE (0x00200000)
>> in PcdDebugPrintErrorLevel.
>>
>> BUG: Although the MTRRs look good to me in the OVMF debug log, I still
>> can't boot >= 64 GB guests with this. Instead of the complaints mentioned
>> above, the Linux guest apparently spirals into an infinite loop (on KVM),
>> or hangs with no CPU load (on TCG).
>
> No, actually there is no bug in this patch (so s/RFC/PATCH/). I did more
> testing and these are the findings:
> - I can reproduce the same issue on KVM with SeaBIOS guests.
> - The exact symptoms are that as soon as the highest guest-phys address
> is >= 64 GB, then the guest kernel doesn't boot. It gets stuck
> somewhere after hitting Enter in grub.
> - Normally 3 GB of the guest RAM is mapped under 4 GB in guest-phys
> address space, then there's a 1 GB PCI hole, and the rest is above
> 4 GB. This means that a 63 GB guest can be started (because 63 - 3 + 4
> == 64), but if you add just 1 MB more, it won't boot.
> - (This was the big discovery:) I flipped the "ept" parameter of the
> kvm_intel module on my host to N, and then things started to work. I
> just booted a 128 GB Linux guest with this patchset. (I have 4 GB
> RAM in my host, plus approx 250 GB swap.) The guest could see it all.
> - The TCG boot didn't hang either; I just couldn't wait earlier for
> network initialization to complete.
>
> I'm CC'ing Paolo for help with the EPT question. Other than that, this
> series is functional. (For QEMU/KVM at least; Xen will likely need more
> fixes from others.)
We have a root cause, it seems. The issue is that the processor in my
laptop, on which I tested, has only 36 bits for physical addresses:
$ grep 'address sizes' /proc/cpuinfo
address sizes : 36 bits physical, 48 bits virtual
...
Which matches where the problem surfaces (64 GB guest-phys address
space) with hw-supported nested paging (EPT) enabled on the host.
In order to confirm this, a colleague of mine gave me access to a server
with 96 GB of RAM, and:
address sizes : 46 bits physical, 48 bits virtual
On this host I booted a 72 GB OVMF guest on QEMU/KVM, with EPT enabled,
and according to the guest dmesg, the guest saw it all.
Memory: 74160924K/75493820K available (7735K kernel code, 1149K
rwdata, 3340K rodata, 1500K init, 1524K bss, 1332896K reserved, 0K
cma-reserved)
Maoming: since you reported this issue, please confirm that the patch
series resolves it for you as well. In that case, I'll repost the series
with "PATCH" as subject-prefix instead of "RFC", and I'll drop the BUG
note from the last commit message.
Thanks
Laszlo
>> Cc: Maoming <maoming.maoming@huawei.com>
>> Cc: Huangpeng (Peter) <peter.huangpeng@huawei.com>
>> Cc: Wei Liu <wei.liu2@citrix.com>
>> Contributed-under: TianoCore Contribution Agreement 1.0
>> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
>> ---
>> OvmfPkg/PlatformPei/MemDetect.c | 43
>> +++++++++++++++++++++++++++++++++++++----
>> 1 file changed, 39 insertions(+), 4 deletions(-)
>>
>> diff --git a/OvmfPkg/PlatformPei/MemDetect.c
>> b/OvmfPkg/PlatformPei/MemDetect.c
>> index 3ceb142..cceab22 100644
>> --- a/OvmfPkg/PlatformPei/MemDetect.c
>> +++ b/OvmfPkg/PlatformPei/MemDetect.c
>> @@ -194,6 +194,8 @@ QemuInitializeRam (
>> {
>> UINT64 LowerMemorySize;
>> UINT64 UpperMemorySize;
>> + MTRR_SETTINGS MtrrSettings;
>> + EFI_STATUS Status;
>>
>> DEBUG ((EFI_D_INFO, "%a called\n", __FUNCTION__));
>>
>> @@ -214,12 +216,45 @@ QemuInitializeRam (
>> }
>> }
>>
>> - MtrrSetMemoryAttribute (BASE_1MB, LowerMemorySize - BASE_1MB,
>> CacheWriteBack);
>> + //
>> + // We'd like to keep the following ranges uncached:
>> + // - [640 KB, 1 MB)
>> + // - [LowerMemorySize, 4 GB)
>> + //
>> + // Everything else should be WB. Unfortunately, programming the inverse
>> (ie.
>> + // keeping the default UC, and configuring the complement set of the
>> above as
>> + // WB) is not reliable in general, because the end of the upper RAM can
>> have
>> + // practically any alignment, and we may not have enough variable MTRRs to
>> + // cover it exactly.
>> + //
>> + if (IsMtrrSupported ()) {
>> + MtrrGetAllMtrrs (&MtrrSettings);
>>
>> - MtrrSetMemoryAttribute (0, BASE_512KB + BASE_128KB, CacheWriteBack);
>> + //
>> + // MTRRs disabled, fixed MTRRs disabled, default type is uncached
>> + //
>> + ASSERT ((MtrrSettings.MtrrDefType & BIT11) == 0);
>> + ASSERT ((MtrrSettings.MtrrDefType & BIT10) == 0);
>> + ASSERT ((MtrrSettings.MtrrDefType & 0xFF) == 0);
>>
>> - if (UpperMemorySize != 0) {
>> - MtrrSetMemoryAttribute (BASE_4GB, UpperMemorySize, CacheWriteBack);
>> + //
>> + // flip default type to writeback
>> + //
>> + SetMem (&MtrrSettings.Fixed, sizeof MtrrSettings.Fixed, 0x06);
>> + ZeroMem (&MtrrSettings.Variables, sizeof MtrrSettings.Variables);
>> + MtrrSettings.MtrrDefType |= BIT11 | BIT10 | 6;
>> + MtrrSetAllMtrrs (&MtrrSettings);
>> +
>> + //
>> + // punch holes
>> + //
>> + Status = MtrrSetMemoryAttribute (BASE_512KB + BASE_128KB,
>> + SIZE_256KB + SIZE_128KB, CacheUncacheable);
>> + ASSERT_EFI_ERROR (Status);
>> +
>> + Status = MtrrSetMemoryAttribute (LowerMemorySize,
>> + SIZE_4GB - LowerMemorySize, CacheUncacheable);
>> + ASSERT_EFI_ERROR (Status);
>> }
>> }
>>
>>
>
------------------------------------------------------------------------------
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/edk2-devel
--- End Message ---