qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature


From: Laszlo Ersek
Subject: Re: [Qemu-devel] [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature "fdt"
Date: Tue, 2 Apr 2019 12:33:43 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

On 04/02/19 09:42, Igor Mammedov wrote:
> On Mon, 1 Apr 2019 15:07:05 +0200
> Laszlo Ersek <address@hidden> wrote:
> 
>> On 03/29/19 14:56, Auger Eric wrote:
>>> Hi Ard,
>>>
>>> On 3/29/19 2:14 PM, Ard Biesheuvel wrote:  
>>>> On Fri, 29 Mar 2019 at 14:12, Auger Eric <address@hidden> wrote:  
>>>>>
>>>>> Hi Shameer,
>>>>>
>>>>> On 3/29/19 10:59 AM, Shameerali Kolothum Thodi wrote:  
>>>>>>
>>>>>>  
>>>>>>> -----Original Message-----
>>>>>>> From: Auger Eric [mailto:address@hidden
>>>>>>> Sent: 29 March 2019 09:32
>>>>>>> To: Shameerali Kolothum Thodi <address@hidden>;
>>>>>>> address@hidden; address@hidden; address@hidden;
>>>>>>> address@hidden; address@hidden;
>>>>>>> address@hidden; address@hidden
>>>>>>> Cc: Linuxarm <address@hidden>; xuwei (O) <address@hidden>;
>>>>>>> Laszlo Ersek <address@hidden>; Ard Biesheuvel
>>>>>>> <address@hidden>; Leif Lindholm <address@hidden>
>>>>>>> Subject: Re: [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature 
>>>>>>> "fdt"
>>>>>>>
>>>>>>> Hi Shameer,
>>>>>>>
>>>>>>> [ + Laszlo, Ard, Leif ]
>>>>>>>
>>>>>>> On 3/21/19 11:47 AM, Shameer Kolothum wrote:  
>>>>>>>> This is to disable/enable populating DT nodes in case
>>>>>>>> any conflict with acpi tables. The default is "off".  
>>>>>>> The name of the option sounds misleading to me. Also we don't really
>>>>>>> know the scope of the disablement. At the moment this just aims to
>>>>>>> prevent the hotpluggable dt nodes from being added if we boot in ACPI 
>>>>>>> mode.
>>>>>>>  
>>>>>>>>
>>>>>>>> This will be used in subsequent patch where cold plug
>>>>>>>> device-memory support is added for DT boot.  
>>>>>>> I am concerned about the fact that in dt mode, by default, you won't see
>>>>>>> any PCDIMM nodes.  
>>>>>>>>
>>>>>>>> If DT memory node support is added for cold-plugged device
>>>>>>>> memory, those memory will be visible to Guest kernel via
>>>>>>>> UEFI GetMemoryMap() and gets treated as early boot memory.  
>>>>>>> Don't we have an issue in UEFI then. Normally the SRAT indicates whether
>>>>>>> the slots are hotpluggable or not. Shouldn't the UEFI code look at this
>>>>>>> info.  
>>>>>>
>>>>>> Sorry I missed this part. Yes, that will be a more cleaner solution.
>>>>>>
>>>>>> Also, to be more clear on what happens,
>>>>>>
>>>>>> Guest ACPI boot with "fdt=on" ,
>>>>>>
>>>>>> From kernel log,
>>>>>>
>>>>>> [    0.000000] Early memory node ranges
>>>>>> [    0.000000]   node   0: [mem 0x0000000040000000-0x00000000bbf5ffff]
>>>>>> [    0.000000]   node   0: [mem 0x00000000bbf60000-0x00000000bbffffff]
>>>>>> [    0.000000]   node   0: [mem 0x00000000bc000000-0x00000000bc02ffff]
>>>>>> [    0.000000]   node   0: [mem 0x00000000bc030000-0x00000000bc36ffff]
>>>>>> [    0.000000]   node   0: [mem 0x00000000bc370000-0x00000000bf64ffff]
>>>>>> [    0.000000]   node   0: [mem 0x00000000bf650000-0x00000000bf6dffff]
>>>>>> [    0.000000]   node   0: [mem 0x00000000bf6e0000-0x00000000bf6effff]
>>>>>> [    0.000000]   node   0: [mem 0x00000000bf6f0000-0x00000000bf80ffff]
>>>>>> [    0.000000]   node   0: [mem 0x00000000bf810000-0x00000000bfffffff]
>>>>>> [    0.000000]   node   0: [mem 0x00000000c0000000-0x00000000ffffffff]  
>>>>>> --> This is the hotpluggable memory node from DT.
>>>>>> [    0.000000] Zeroed struct page in unavailable ranges: 1040 pages
>>>>>> [    0.000000] Initmem setup node 0 [mem 
>>>>>> 0x0000000040000000-0x00000000ffffffff]
>>>>>>
>>>>>>
>>>>>> Guest ACPI boot with "fdt=off" ,
>>>>>>
>>>>>> [    0.000000] Movable zone start for each node
>>>>>> [    0.000000] Early memory node ranges
>>>>>> [    0.000000]   node   0: [mem 0x0000000040000000-0x00000000bbf5ffff]
>>>>>> [    0.000000]   node   0: [mem 0x00000000bbf60000-0x00000000bbffffff]
>>>>>> [    0.000000]   node   0: [mem 0x00000000bc000000-0x00000000bc02ffff]
>>>>>> [    0.000000]   node   0: [mem 0x00000000bc030000-0x00000000bc36ffff]
>>>>>> [    0.000000]   node   0: [mem 0x00000000bc370000-0x00000000bf64ffff]
>>>>>> [    0.000000]   node   0: [mem 0x00000000bf650000-0x00000000bf6dffff]
>>>>>> [    0.000000]   node   0: [mem 0x00000000bf6e0000-0x00000000bf6effff]
>>>>>> [    0.000000]   node   0: [mem 0x00000000bf6f0000-0x00000000bf80ffff]
>>>>>> [    0.000000]   node   0: [mem 0x00000000bf810000-0x00000000bfffffff]
>>>>>> [    0.000000] Zeroed struct page in unavailable ranges: 1040 pages
>>>>>> [    0.000000] Initmem setup node 0 [mem 
>>>>>> 0x0000000040000000-0x00000000bfffffff
>>>>>>
>>>>>> The hotpluggable memory node is absent from early memory nodes here.  
>>>>>
>>>>> OK thank you for the example illustrating the concern.  
>>>>>>
>>>>>> As you said, it could be possible to detect this node using SRAT in 
>>>>>> UEFI.  
>>>>>
>>>>> Let's wait for EDK2 experts on this.
>>>>>  
>>>>
>>>> Happy to chime in, but I need a bit more context here.
>>>>
>>>> What is the problem, how does this path try to solve it, and why is
>>>> that a bad idea?
>>>>  
>>> Sure, sorry.
>>>
>>> This series:
>>> - [PATCH v3 00/10] ARM virt: ACPI memory hotplug support,
>>> https://patchwork.kernel.org/cover/10863301/
>>>
>>> aims to introduce PCDIMM support in qemu. In ACPI mode, it builds the
>>> SRAT and DSDT parts and relies on GED to trigger the hotplug.
>>>
>>> We noticed that if we build the hotpluggable memory dt nodes on top of
>>> the above ACPI tables, the DIMM slots are interpreted as not
>>> hotpluggable memory slots (at least we think so).
>>>
>>> We think the EDK2 GetMemoryMap() uses the dt node info and ignores the
>>> fact that those slots are exposed as hotpluggable in the SRAT for example.
>>>
>>> So in this series, we are forced to not generate the hotpluggable memory
>>> dt nodes if we want the DIMM slots to be effectively recognized as
>>> hotpluggable.
>>>
>>> Could you confirm we have a correct understanding of the EDK2 behaviour
>>> and if so, would there be any solution for EDK2 to absorb both the DT
>>> nodes and the relevant SRAT/DSDT tables and make the slots hotpluggable.
>>>
>>> At qemu level, detecting we are booting in ACPI mode and purposely
>>> removing the above mentioned DT nodes does not look straightforward.  
>>
>> The firmware is not enlightened about the ACPI content that comes from
>> QEMU / fw_cfg. That ACPI content is *blindly* processed by the firmware,
>> as instructed through the ACPI linker/loader script, in order to install
>> the ACPI content for the OS. No actual information is consumed by the
>> firmware from the ACPI payload -- and that's a feature.
>>
>> The firmware does consume DT:
>>
>> - If you start QEMU *with* "-no-acpi", then the DT is both consumed by
>> the firmware (for its own information needs), and passed on to the OS.
>>
>> - If you start QEMU *without* "-no-acpi" (the default), then the DT is
>> consumed only by the firmware (for its own information needs), and the
>> DT is hidden from the OS. The OS gets only the ACPI content
>> (processed/prepared as described above).
>>
>>
>> In the firmware, the "ArmVirtPkg/HighMemDxe" driver iterates over the
>> base/size pairs in all the memory nodes in the DT. For each such base
>> address that is currently tracked as "nonexistent" in the GCD memory
>> space map, the driver currently adds the base/size range as "system
>> memory". This in turn is reflected by the UEFI memmap that the OS gets
>> to see as "conventional memory".
>>
>> If you need some memory ranges to show up as "special" in the UEFI
>> memmap, then you need to distinguish them somehow from the "regular"
>> memory areas, in the DT. And then extend "ArmVirtPkg/HighMemDxe" in the
>> firmware, so that it act upon the discriminator that you set in the DT.
>>
>>
>> Now... from a brief look at the Platform Init and UEFI specs, my
>> impression is that the hotpluggable (but presently not plugged) DIMM
>> ranges should simply be *absent* from the UEFI memmap; is that correct?
>> (I didn't check the ACPI spec, maybe it specifies the expected behavior
>> in full.) If my impression is correct, then two options (alternatives)
>> exist:
>>
>> (1) Hide the affected memory nodes -- or at least the affected base/size
>> pairs -- from the DT, in case you boot without "-no-acpi" but with an
>> external firmware loaded. Then the firmware will not expose those ranges
>> as "conventional memory" in the UEFI memmap. This approach requires no
>> changes to edk2.
>>
>> This option is precisely what Eric described up-thread, at
>> <http://mid.mail-archive.com/address@hidden>:
>>
>>> in machvirt_init, there is firmware_loaded that tells you whether you
>>> have a FW image. If this one is not set, you can induce dt. But if
>>> there is a FW it can be either DT or ACPI booted. You also have the
>>> acpi_enabled knob.  
>>
>> (The "-no-acpi" cmdline option clears the "acpi_enabled" variable in
>> "vl.c").
>>
>> So, the condition for hiding the hotpluggable memory nodes in question
>> from the DT is:
>>
>>   (aarch64 && firmware_loaded && acpi_enabled)
> I'd go with this one, though I have a question for firmware side.
> Let's assume we would want in future to expose hotpluggable & present
> memory via GetMemoryMap() (like bare-metal does) (guest OS theoretically
> can avoid using it for Normal zone based on hint from SRAT table early
> at boot), but what about firmware can it inspect SRAT table and not use
> hotpluggable ranges for its own use (or at least do not canibalize
> them permanently)?

This is actually two questions:

(a) Can the firmware inspect SRAT?

If the SRAT table structure isn't very complex, this is technically
doable, but the wrong thing to do, IMO.

First, we've tried hard to avoid enlightening the firmware about the
semantics of QEMU's ACPI tables.

Second, this would introduce an ordering constraint (or callbacks) in
the firmware, between the driver that processes & installs the ACPI
tables, and the driver that translates the memory nodes of the DT to the
memory ranges known to UEFI and the OS.

If we need such hinting, then option (2) below (from earlier context)
would be better:
- If it's OK to use an arm/aarch64 specific solution, then new DT
properties should work.
- If it should be arch-independent, then a dedicated fw_cfg file would
be better.

(b) Assuming we have the information from some source, can the firmware
expose some memory ranges as "usable RAM" to the OS, while staying away
from them for its own (firmware) purposes?

After consulting

  Table 25. Memory Type Usage before ExitBootServices()
  Table 26. Memory Type Usage after ExitBootServices()

in UEFI-2.7, I would say that the firmware driver that installs these
ranges to the memory (space) map should also allocate the ranges right
after, as EfiBootServicesData. This will prevent other drivers /
applications in the firmware from allocating chunks out of those areas,
and the OS will be at liberty to release and repurpose the ranges after
ExitBootServices().

Thanks,
Laszlo

>> (2) Invent and set an "ignore me, firmware" property for the
>> hotpluggable memory nodes in the DT, and update the firmware to honor
>> that property.
>>
>> Thanks
>> Laszlo
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]