qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] 64-bit MMIO aperture expansion


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] 64-bit MMIO aperture expansion
Date: Fri, 21 Sep 2018 13:20:39 -0400

On Fri, Sep 21, 2018 at 06:01:30PM +0300, Marcel Apfelbaum wrote:
> 
> 
> On 09/20/2018 05:49 PM, Laszlo Ersek wrote:
> > Hi Marcel,
> 
> Hi Laszlo,
> I had to read this mail a few times...
> 
> > 
> > this email should actually be an RFC patch. But RFC patches tend to turn
> > into real PATCHes (if the submitter is lucky, that is), and I can't
> > really promise sending multiple versions of a PATCH at this time. So
> > please consider this a "maybe bug report".
> > 
> > In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI
> > hole", 2017-11-16) we added logic so that QEMU expand the 64-bit PCI
> > hole (for hotplug purposes), if (a) the firmware doesn't "configure" one
> > (via programming individual BARs with 64-bit addresses), or (b) the
> > firmware's programming results in an aperture smaller than we'd like
> > (32GB on Q35).
> 
> Right
> 
> > 
> > We made sure that the aperture required by the firmware's programming
> > would never be shrunken or otherwise truncated by QEMU, so that's fine.
> > However, the expansion doesn't work as "widely" in all cases as it
> > should.
> > 
> > Consider the following three functions, at current master (= commit
> > 19b599f7664b):
> > 
> > [hw/i386/pc.c]
> > 
> > > /*
> > >   * The 64bit pci hole starts after "above 4G RAM" and
> > >   * potentially the space reserved for memory hotplug.
> > >   */
> > > uint64_t pc_pci_hole64_start(void)
> > > {
> > >      PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
> > >      PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> > >      MachineState *ms = MACHINE(pcms);
> > >      uint64_t hole64_start = 0;
> > > 
> > >      if (pcmc->has_reserved_memory && ms->device_memory->base) {
> > >          hole64_start = ms->device_memory->base;
> > >          if (!pcmc->broken_reserved_end) {
> > >              hole64_start += memory_region_size(&ms->device_memory->mr);
> > >          }
> > >      } else {
> > >          hole64_start = 0x100000000ULL + pcms->above_4g_mem_size;
> > >      }
> > > 
> > >      return ROUND_UP(hole64_start, 1 * GiB);
> > > }
> > [hw/pci-host/q35.c]
> > 
> > > /*
> > >   * The 64bit PCI hole start is set by the Guest firmware
> > >   * as the address of the first 64bit PCI MEM resource.
> > >   * If no PCI device has resources on the 64bit area,
> > >   * the 64bit PCI hole will start after "over 4G RAM" and the
> > >   * reserved space for memory hotplug if any.
> > >   */
> > > static void q35_host_get_pci_hole64_start(Object *obj, Visitor *v,
> > >                                            const char *name, void *opaque,
> > >                                            Error **errp)
> > > {
> > >      PCIHostState *h = PCI_HOST_BRIDGE(obj);
> > >      Q35PCIHost *s = Q35_HOST_DEVICE(obj);
> > >      Range w64;
> > >      uint64_t value;
> > > 
> > >      pci_bus_get_w64_range(h->bus, &w64);
> > >      value = range_is_empty(&w64) ? 0 : range_lob(&w64);
> > >      if (!value && s->pci_hole64_fix) {
> > >          value = pc_pci_hole64_start();
> > >      }
> > >      visit_type_uint64(v, name, &value, errp);
> > > }
> > > 
> > > /*
> > >   * The 64bit PCI hole end is set by the Guest firmware
> > >   * as the address of the last 64bit PCI MEM resource.
> > >   * Then it is expanded to the PCI_HOST_PROP_PCI_HOLE64_SIZE
> > >   * that can be configured by the user.
> > >   */
> > > static void q35_host_get_pci_hole64_end(Object *obj, Visitor *v,
> > >                                          const char *name, void *opaque,
> > >                                          Error **errp)
> > > {
> > >      PCIHostState *h = PCI_HOST_BRIDGE(obj);
> > >      Q35PCIHost *s = Q35_HOST_DEVICE(obj);
> > >      uint64_t hole64_start = pc_pci_hole64_start();
> > >      Range w64;
> > >      uint64_t value, hole64_end;
> > > 
> > >      pci_bus_get_w64_range(h->bus, &w64);
> > >      value = range_is_empty(&w64) ? 0 : range_upb(&w64) + 1;
> > >      hole64_end = ROUND_UP(hole64_start + s->mch.pci_hole64_size, 1ULL << 
> > > 30);
> > >      if (s->pci_hole64_fix && value < hole64_end) {
> > >          value = hole64_end;
> > >      }
> > >      visit_type_uint64(v, name, &value, errp);
> > > }
> > > 
> > Now consider the following scenario:
> > 
> > - the firmware programs some BARs with 64-bit addresses such that the
> >    aperture that we deduce starts at 32GB,
> > 
> > - the guest has 4GB of RAM, and no DIMM hotplug range.
> > 
> > Consequences:
> > 
> > - Because the "32-bit RAM split" for Q35 is at 2GB, the
> >    pc_pci_hole64_start() function will return 6GB.
> > 
> > - The q35_host_get_pci_hole64_start() function will return 32GB. (It
> >    will not fall back to pc_pci_hole64_start() -- correctly -- because
> >    the firmware has programmed some BARs with 64-bit addresses.)
> > 
> > - The q35_host_get_pci_hole64_end() function *intends* to return 64GB,
> >    because -- let's say -- the guest assigned BARs covering the
> >    32GB..34GB range, which is 2GB in size, and we *intend* to round that
> >    size up to 32GB, so that 30GB be left for hotplug purposes. (This is
> >    the original intent of commit 9fa99d2519cb.)
> > - However, because we initialize "hole64_start" from
> >    pc_pci_hole64_start(), and not from q35_host_get_pci_hole64_start(),
> >    we add "mch.pci_hole64_size" (32GB by default) to 6GB (the end of
> >    RAM), and not to 32GB (the aperture base deduced from the firmware's
> >    programming). As a result, we'll extend the aperture end address only
> >    to 38GB, and not to 64GB.
> 
> Right, there is no reason to use pc_pci_hole64_start, it looks
> like a plain bug. We diverged from pc and the fact that
> q35_host_get_pci_hole64_star uses it is only an implementation
> detail.
> 
> > My suggestion is simply to initialize "hole64_start" from
> > q35_host_get_pci_hole64_start(), in the q35_host_get_pci_hole64_end()
> > function. If the firmware doesn't program 64-bit addresses, then this
> > change is a no-op -- q35_host_get_pci_hole64_start() will fall back to
> > pc_pci_hole64_start() in that case. Otherwise, the behavior will be
> > fixed.
> 
> Looks OK to me.
> 
> > Now, there's another complication, obviously -- machine type compat. In
> > commit 9fa99d2519cb, we added the "pci_hole64_fix" compat property. I
> > assume the additional fix I'm proposing requires another compat
> > property?
> 
> We have to, is a guest visible change. I really don't like these compat
> properties, but I don't see a way around it.

Well does it only affect ACPI? Or other stuff? ACPI changes
are mostly safe without need for compat things.

> >   Something like:
> > 
> > > diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
> > > index 02f95765880a..c02b128189cd 100644
> > > --- a/hw/pci-host/q35.c
> > > +++ b/hw/pci-host/q35.c
> > > @@ -138,12 +138,14 @@ static void q35_host_get_pci_hole64_end(Object 
> > > *obj, Visitor *v,
> > >   {
> > >       PCIHostState *h = PCI_HOST_BRIDGE(obj);
> > >       Q35PCIHost *s = Q35_HOST_DEVICE(obj);
> > > -    uint64_t hole64_start = pc_pci_hole64_start();
> > > +    uint64_t hole64_start;
> > >       Range w64;
> > >       uint64_t value, hole64_end;
> > > 
> > >       pci_bus_get_w64_range(h->bus, &w64);
> > >       value = range_is_empty(&w64) ? 0 : range_upb(&w64) + 1;
> > > +    hole64_start = s->pci_hole64_fix2 ? q35_host_get_pci_hole64_start() :
> > > +                                        pc_pci_hole64_start();
> > >       hole64_end = ROUND_UP(hole64_start + s->mch.pci_hole64_size, 1ULL 
> > > << 30);
> > >       if (s->pci_hole64_fix && value < hole64_end) {
> > >           value = hole64_end;
> > The same would apply to i440fx too.
> 
> I am lost here. The q35 PCI 64bit hole computation issue starts from the
> miss-use
> of the PC conterpart functions. What is the problem with the PC?
> 
> > 
> > What do you think?
> 
> I think is a clear bug, even we can say we promised "32G" hole and we do
> provide it.
> But we clearly intended to have 32G-64G space in this case.
> 
> Michael, what do you think?
> 
> Thanks,
> Marcel
> 

I agree here.

> > 
> > Thanks
> > Laszlo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]