qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RFC 4/6] i386/pc: Keep PCI 64-bit hole within usable IOVA spa


From: Igor Mammedov
Subject: Re: [PATCH RFC 4/6] i386/pc: Keep PCI 64-bit hole within usable IOVA space
Date: Mon, 28 Jun 2021 17:37:19 +0200

On Wed, 23 Jun 2021 14:22:29 +0100
Joao Martins <joao.m.martins@oracle.com> wrote:

> On 6/23/21 1:30 PM, Igor Mammedov wrote:
> > On Tue, 22 Jun 2021 16:49:03 +0100
> > Joao Martins <joao.m.martins@oracle.com> wrote:
> >   
> >> pci_memory initialized by q35 and i440fx is set to a range
> >> of 0 .. UINT64_MAX, and as a consequence when ACPI and pci-host
> >> pick the hole64_start it does not account for allowed IOVA ranges.
> >>
> >> Rather than blindly returning, round up the hole64_start
> >> value to the allowable IOVA range, such that it accounts for
> >> the 1Tb hole *on AMD*. On Intel it returns the input value
> >> for hole64 start.  
> > 
> > wouldn't that work only in case where guest firmware hadn't
> > mapped any PCI bars above 4Gb (possibly in not allowed region).
> > 
> > Looking at Seabios, it uses reserved_memory_end as the start
> > of PCI64 window. Not sure about OVMF,
> >  CCing Laszlo.
> >   
> Hmmm, perhaps only in the case that I don't advertise the reserved
> region (i.e. mem size not being big enough to let the guest know in
> the e820). But then that it begs the question, should we then always
> advertise the said HT region as reserved regardless of memory size?

Where in firmware that e820 reserved entry will be accounted for?
All I see in Seabios is qemu_cfg_e820() which accounts for E820_RAM (sometimes)
and there the high RAM is hardwired to 4G. Then later on it might get used
to setup the start of 64-bit PCI hole if etc/reserved-memory-end
doesn't exist.

And for E820_RAM to work, the entry with highest address has to be
the last in the table, which is fragile and it's only a question of
time when someone breaks that.
(well if it's explicitly specified in ACPI spec, I won't object
if properly documented)
So firmware side will also need changes to account for this,
and make it independent if 4G starting point.

For Seabios it might be better to make use of fwcfg file Laszlo has
suggested in his reply than relying on e820 & reserved-memory-end,
either way Seabios would need related patches for this to work
properly. For OVMF you can get away with QEMU only patches,
if using that knob.

(PS:
I'm not PCI expert so there might be more issues hidden there
)

> >> Suggested-by: David Edmondson <david.edmondson@oracle.com>
> >> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> >> ---
> >>  hw/i386/pc.c         | 17 +++++++++++++++--
> >>  hw/pci-host/i440fx.c |  4 +++-
> >>  hw/pci-host/q35.c    |  4 +++-
> >>  include/hw/i386/pc.h |  3 ++-
> >>  4 files changed, 23 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> >> index 2e2ea82a4661..65885cc16037 100644
> >> --- a/hw/i386/pc.c
> >> +++ b/hw/i386/pc.c
> >> @@ -1141,7 +1141,7 @@ void pc_memory_init(PCMachineState *pcms,
> >>   * The 64bit pci hole starts after "above 4G RAM" and
> >>   * potentially the space reserved for memory hotplug.
> >>   */
> >> -uint64_t pc_pci_hole64_start(void)
> >> +uint64_t pc_pci_hole64_start(uint64_t size)
> >>  {
> >>      PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
> >>      PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> >> @@ -1155,12 +1155,25 @@ uint64_t pc_pci_hole64_start(void)
> >>              hole64_start += memory_region_size(&ms->device_memory->mr);
> >>          }
> >>      } else {
> >> -        hole64_start = 0x100000000ULL + x86ms->above_4g_mem_size;
> >> +        if (!x86ms->above_1t_mem_size) {
> >> +            hole64_start = 0x100000000ULL + x86ms->above_4g_mem_size;
> >> +        } else {
> >> +            hole64_start = x86ms->above_1t_maxram_start;
> >> +        }
> >>      }  
> >   
> >> +    hole64_start = allowed_round_up(hole64_start, size);  
> > 
> > I'm not sure but, might it cause problems if there were BARs placed
> > by firmware in region below rounded up value?
> > (i.e. ACPI description which uses PCI_HOST_PROP_PCI_HOLE_START property
> > won't match whatever firmware programmed due to rounding pushing hole
> > start up)
> >   
> But wouldn't then the problem be firmware ignoring E820 to avoid putting
> firmware region where it shouldn't? Unless of course, it wasn't advertised
> like I mentioned earlier.
> 
> >>      return ROUND_UP(hole64_start, 1 * GiB);
> >>  }
> >>  
> >> +uint64_t pc_pci_hole64_start_aligned(uint64_t start, uint64_t size)
> >> +{
> >> +    if (nb_iova_ranges == DEFAULT_NR_USABLE_IOVAS) {
> >> +        return start;
> >> +    }
> >> +    return allowed_round_up(start, size);
> >> +}
> >> +
> >>  DeviceState *pc_vga_init(ISABus *isa_bus, PCIBus *pci_bus)
> >>  {
> >>      DeviceState *dev = NULL;
> >> diff --git a/hw/pci-host/i440fx.c b/hw/pci-host/i440fx.c
> >> index 28c9bae89944..e8eaebfe1034 100644
> >> --- a/hw/pci-host/i440fx.c
> >> +++ b/hw/pci-host/i440fx.c
> >> @@ -163,8 +163,10 @@ static uint64_t 
> >> i440fx_pcihost_get_pci_hole64_start_value(Object *obj)
> >>      pci_bus_get_w64_range(h->bus, &w64);
> >>      value = range_is_empty(&w64) ? 0 : range_lob(&w64);
> >>      if (!value && s->pci_hole64_fix) {
> >> -        value = pc_pci_hole64_start();
> >> +        value = pc_pci_hole64_start(s->pci_hole64_size);
> >>      }
> >> +    /* This returns @value when not on AMD */
> >> +    value = pc_pci_hole64_start_aligned(value, s->pci_hole64_size);
> >>      return value;
> >>  }
> >>  
> >> diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
> >> index 2eb729dff585..d556eb965ddb 100644
> >> --- a/hw/pci-host/q35.c
> >> +++ b/hw/pci-host/q35.c
> >> @@ -126,8 +126,10 @@ static uint64_t 
> >> q35_host_get_pci_hole64_start_value(Object *obj)
> >>      pci_bus_get_w64_range(h->bus, &w64);
> >>      value = range_is_empty(&w64) ? 0 : range_lob(&w64);
> >>      if (!value && s->pci_hole64_fix) {
> >> -        value = pc_pci_hole64_start();
> >> +        value = pc_pci_hole64_start(s->mch.pci_hole64_size);
> >>      }
> >> +    /* This returns @value when not on AMD */
> >> +    value = pc_pci_hole64_start_aligned(value, s->mch.pci_hole64_size);
> >>      return value;
> >>  }
> >>  
> >> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> >> index 73b8e2900c72..b924aef3a218 100644
> >> --- a/include/hw/i386/pc.h
> >> +++ b/include/hw/i386/pc.h
> >> @@ -217,7 +217,8 @@ void pc_memory_init(PCMachineState *pcms,
> >>                      MemoryRegion *system_memory,
> >>                      MemoryRegion *rom_memory,
> >>                      MemoryRegion **ram_memory);
> >> -uint64_t pc_pci_hole64_start(void);
> >> +uint64_t pc_pci_hole64_start(uint64_t size);
> >> +uint64_t pc_pci_hole64_start_aligned(uint64_t value, uint64_t size);
> >>  DeviceState *pc_vga_init(ISABus *isa_bus, PCIBus *pci_bus);
> >>  void pc_basic_device_init(struct PCMachineState *pcms,
> >>                            ISABus *isa_bus, qemu_irq *gsi,  
> >   
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]