qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ovmf / PCI passthrough impaired due to very limiting PCI64 aperture


From: Laszlo Ersek
Subject: Re: ovmf / PCI passthrough impaired due to very limiting PCI64 aperture
Date: Wed, 17 Jun 2020 18:14:23 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0 Thunderbird/52.9.1

On 06/17/20 15:46, Dr. David Alan Gilbert wrote:
> * Laszlo Ersek (lersek@redhat.com) wrote:
>> On 06/16/20 19:14, Guilherme Piccoli wrote:
>>> Thanks Gerd, Dave and Eduardo for the prompt responses!
>>>
>>> So, I understand that when we use "-host-physical-bits", we are
>>> passing the *real* number for the guest, correct? So, in this case we
>>> can trust that the guest physbits matches the true host physbits.
>>>
>>> What if then we have OVMF relying in the physbits *iff*
>>> "-host-phys-bits" is used (which is the default in RH and a possible
>>> machine configuration on libvirt XML in Ubuntu), and we have OVMF
>>> fallbacks to 36-bit otherwise?
>>
>> I've now read the commit message on QEMU commit 258fe08bd341d, and the
>> complexity is simply stunning.
>>
>> Right now, OVMF calculates the guest physical address space size from
>> various range sizes (such as hotplug memory area end, default or
>> user-configured PCI64 MMIO aperture), and derives the minimum suitable
>> guest-phys address width from that address space size. This width is
>> then exposed to the rest of the firmware with the CPU HOB (hand-off
>> block), which in turn controls how the GCD (global coherency domain)
>> memory space map is sized. Etc.
>>
>> If QEMU can provide a *reliable* GPA width, in some info channel (CPUID
>> or even fw_cfg), then the above calculation could be reversed in OVMF.
>> We could take the width as a given (-> produce the CPU HOB directly),
>> plus calculate the *remaining* address space between the GPA space size
>> given by the width, and the end of the memory hotplug area end. If the
>> "remaining size" were negative, then obviously QEMU would have been
>> misconfigured, so we'd halt the boot. Otherwise, the remaining area
>> could be used as PCI64 MMIO aperture (PEI memory footprint of DXE page
>> tables be darned).
>>
>>> Now, regarding the problem "to trust or not" in the guests' physbits,
>>> I think it's an orthogonal discussion to some extent. It'd be nice to
>>> have that check, and as Eduardo said, prevent migration in such cases.
>>> But it's not really preventing OVMF big PCI64 aperture if we only
>>> increase the aperture _when  "-host-physical-bits" is used_.
>>
>> I don't know what exactly those flags do, but I doubt they are clearly
>> visible to OVMF in any particular way.
> 
> The firmware should trust whatever it reads from the cpuid and thus gets
> told from qemu; if qemu is doing the wrong thing there then that's our
> problem and we need to fix it in qemu.

This sounds good in practice, but -- as Gerd too has stated, to my
understanding -- it has potential to break existing usage.

Consider assigning a single device with a 32G BAR -- right now that's
supposed to work, without the X-PciMmio64Mb OVMF knob, on even the "most
basic" hardware (36-bit host phys address width, and EPT supported). If
OVMF suddenly starts trusting the CPUID from QEMU, and that results in a
GPA width of 40 bits (i.e. new OVMF is run on old QEMU), then the big
BAR (and other stuff too) could be allocated from GPA space that EPT is
actually able to deal with. --> regression for the user.

Sometimes I can tell users "hey given that you're building OVMF from
source, or taking it from a 3rd party origin anyway, can you just run
upstream QEMU too", but most of the time they just want everything to
continue working on a 3 year old Ubuntu LTS release or whatever. :/

And again, this is *without* "X-PciMmio64Mb".

Laszlo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]