[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 3/3] acpi-build: allocate mcfg for multiple host b

From: Laszlo Ersek
Subject: Re: [Qemu-devel] [RFC 3/3] acpi-build: allocate mcfg for multiple host bridges
Date: Wed, 23 May 2018 19:25:04 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0

On 05/23/18 19:11, Marcel Apfelbaum wrote:
> On 05/23/2018 10:32 AM, Laszlo Ersek wrote:
>> On 05/23/18 01:40, Michael S. Tsirkin wrote:
>>> On Wed, May 23, 2018 at 12:42:09AM +0200, Laszlo Ersek wrote:

>>>> If we figure out a placement strategy or an easy to consume
>>>> representation of these data for the firmware, it might be possible
>>>> for OVMF to hook them into the edk2 core (although not in the
>>>> earliest firmware phases, such as SEC and PEI).
> Can you please remind me how OVMF places the 64-bit PCI hotplug
> window?

If you mean the 64-bit PCI MMIO aperture, I described it here in detail:


I'll also quote it inline, before returning to your email:

On 03/26/18 16:10, address@hidden wrote:
> https://bugzilla.redhat.com/show_bug.cgi?id=1353591
> Laszlo Ersek <address@hidden> changed:
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>               Flags|needinfo?(address@hidden |
>                    |)                           |
> --- Comment #8 from Laszlo Ersek <address@hidden> ---
> Sure, I can attempt :) The function to look at is GetFirstNonAddress()
> in "OvmfPkg/PlatformPei/MemDetect.c". I'll try to write it up here in
> natural language (although I commented the function heavily as well).
> As an introduction, the "number of address bits" is a quantity that
> the firmware itself needs to know, so that in the DXE phase page
> tables exist that actually map that address space. The
> GetFirstNonAddress() function (in the PEI phase) calculates the
> highest *exclusive* address that the firmware might want or need to
> use (in the DXE phase).
> (1) First we get the highest exclusive cold-plugged RAM address.
> (There are two methods for this, the more robust one is to read QEMU's
> E820 map, the older / less robust one is to calculate it from the
> CMOS.) If the result would be <4GB, then we take exactly 4GB from this
> step, because the firmware always needs to be able to address up to
> 4GB. Note that this is already somewhat non-intuitive; for example, if
> you have 4GB of RAM (as in, *amount*), it will go up to 6GB in the
> guest-phys address space (because [0x8000_0000..0xFFFF_FFFF] is not
> RAM but MMIO on q35).
> (2) If the DXE phase is 32-bit, then we're done. (No addresses >=4GB
> can be accessed, either for RAM or MMIO.) For RHEL this is never the
> case.
> (3) Grab the size of the 64-bit PCI MMIO aperture. This defaults to
> 32GB, but a custom (OVMF specific) fw_cfg file (from the QEMU command
> line) can resize it or even disable it. This aperture is relevant
> because it's going to be the top of the address space that the
> firmware is interested in. If the aperture is disabled (on the QEMU
> cmdline), then we're done, and only the value from point (1) matters
> -- that determines the address width we need.
> (4) OK, so we have a 64-bit PCI MMIO aperture (for allocating BARs out
> of, later); we have to place it somewhere. The base cannot match the
> value from (1) directly, because that would not leave room for the
> DIMM hotplug area. So the end of that area is read from the fw_cfg
> file "etc/reserved-memory-end". DIMM hotplug is enabled iff
> "etc/reserved-memory-end" exists. If "etc/reserved-memory-end" exists,
> then it is guaranteed to be larger than the value from (1) -- i.e.,
> top of cold-plugged RAM.
> (5) We round up the size of the 64-bit PCI aperture to 1GB. We also
> round up the base of the same -- i.e., from (4) or (1), as appropriate
> -- to 1GB. This is inspired by SeaBIOS, because this lets the host map
> the aperture with 1GB hugepages.
> (6) The base address of the aperture is then rounded up so that it
> ends up aligned "naturally". "Natural" alignment means that we take
> the largest whole power of two (i.e., BAR size) that can fit *within*
> the aperture (whose size comes from (3) and (5)) and use that BAR size
> as alignment requirement. This is because the PciBusDxe driver sorts
> the BARs in decreasing size order (and equivalently, decreasing
> alignment order), for allocation in increasing address order, so if
> our aperture base is aligned sufficiently for the largest BAR that can
> theoretically fit into the aperture, then the base will be aligned
> correctly for *any* other BAR that fits.
> For example, if you have a 32GB aperture size, then the largest BAR
> that can fit is 32GB, so the alignment requirement in step (6) will be
> 32GB. Whereas, if the user configures a 48GB aperture size in (3),
> then your alignment will remain 32GB in step (6), because a 64GB BAR
> would not fit, and a 32GB BAR (which fits) dictates a 32GB alignment.
> Thus we have the following "ladder" of ranges:
> (a) cold-plugged RAM (low, <2GB)
> (b) 32-bit PCI MMIO aperture, ECAM/MMCONFIG, APIC, pflash, etc (<4GB)
> (c) cold-plugged RAM (high, >=4GB)
> (d) DIMM hot-plug area
> (e) padding up to 1GB alignment (for hugepages)
> (f) padding up to the natural alignment of the 64-bit PCI MMIO
>    aperture size (32GB by default)
> (g) 64-bit PCI MMIO aperture
> To my understanding, "maxmem" determines the end of (d). And, the
> address width is dictated by the end of (g).
> Two more examples.
> - If you have 36 phys address bits, that doesn't let you use
>   maxmem=32G. This is because maxmem=32G puts the end of the DIMM
>   hotplug area (d) strictly *above* 32GB (due to the "RAM gap" (b)),
>   and then the padding (f) places the 64-bit PCI MMIO aperture at
>   64GB. So 36 phys address bits don't suffice.
> - On the other hand, if you have 37 phys address bits, that *should*
>   let you use maxmem=64G. While the DIMM hot-plug area will end
>   strictly above 64GB, the 64-bit PCI MMIO aperture (of size 32GB) can
>   be placed at 96GB, so it will all fit into 128GB (i.e. 37 address
>   bits).
> Sorry if this is confusing, I got very little sleep last night.

Back to your email:

On 05/23/18 19:11, Marcel Apfelbaum wrote:
> I think we may be able to succeed with "standard" APCI declarations of
> the PCI segments + placing the extra MMCONFIG ranges before the 64-bit
> PCI hotplug area.

That idea could work, but firmware will need hints about it.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]