qemu-stable
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-stable] [Qemu-devel] [PATCH] pc: acpi: revert back to 1 SRAT e


From: Eduardo Habkost
Subject: Re: [Qemu-stable] [Qemu-devel] [PATCH] pc: acpi: revert back to 1 SRAT entry for hotpluggable area
Date: Wed, 22 Aug 2018 15:01:12 -0300
User-agent: Mutt/1.9.2 (2017-12-15)

On Wed, Aug 22, 2018 at 03:05:36PM +0200, Igor Mammedov wrote:
> On Wed, 22 Aug 2018 12:06:26 +0200
> Laszlo Ersek <address@hidden> wrote:
> 
> > On 08/22/18 11:46, Igor Mammedov wrote:
> > > Commit
> > >   10efd7e108 "pc: acpi: fix memory hotplug regression by reducing stub 
> > > SRAT entry size"
> > > attemped to fix hotplug regression introduced by
> > >   848a1cc1e "hw/acpi-build: build SRAT memory affinity structures for 
> > > DIMM devices"
> > > 
> > > fixed issue for Windows/3.0+ linux kernels, however it regressed 2.6 based
> > > kernels (RHEL6) to the point where guest might crash at boot.
> > > Reason is that 2.6 kernel discards SRAT table due too small last entry
> > > which down the road leads to crashes. Hack I've tried in 10efd7e108 is 
> > > also
> > > not ACPI spec compliant according to which whole possible RAM should be
> > > described in SRAT. Revert 10efd7e108 to fix regression for 2.6 based 
> > > kernels.
> > > 
> > > With 10efd7e108 reverted, I've also tried splitting SRAT table statically
> > > in different ways %/node and %/slot but Windows still fails to online
> > > 2nd pc-dimm hot-plugged into node 0 (as described in 10efd7e108) and
> > > sometimes even coldplugged pc-dimms where affected with static SRAT
> > > partitioning.
> > > The only known so far way where Windows stays happy is when we have 1
> > > SRAT entry in the last node covering all hotplug area.
> > > 
> > > Revert 848a1cc1e until we come up with a way to avoid regression
> > > on Windows with hotplug area split in several entries.
> > > Tested this with 2.6/3.0 based kernels (RHEL6/7) and WS20[08/12/12R2/16]).
> > > 
> > > Signed-off-by: Igor Mammedov <address@hidden>
> > > ---
> > > CC: address@hidden
> > > CC: address@hidden
> > > CC: address@hidden
> > > CC: address@hidden
> > > CC: address@hidden
> > > ---
> > >  hw/i386/acpi-build.c | 73 
> > > +++++++++-------------------------------------------
> > >  1 file changed, 12 insertions(+), 61 deletions(-)
> > > 
> > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > index e1ee8ae..1599caa 100644
> > > --- a/hw/i386/acpi-build.c
> > > +++ b/hw/i386/acpi-build.c
> > > @@ -2251,64 +2251,6 @@ build_tpm2(GArray *table_data, BIOSLinker *linker, 
> > > GArray *tcpalog)
> > >  #define HOLE_640K_START  (640 * KiB)
> > >  #define HOLE_640K_END   (1 * MiB)
> > >  
> > > -static void build_srat_hotpluggable_memory(GArray *table_data, uint64_t 
> > > base,
> > > -                                           uint64_t len, int 
> > > default_node)
> > > -{
> > > -    MemoryDeviceInfoList *info_list = qmp_memory_device_list();
> > > -    MemoryDeviceInfoList *info;
> > > -    MemoryDeviceInfo *mi;
> > > -    PCDIMMDeviceInfo *di;
> > > -    uint64_t end = base + len, cur, size;
> > > -    bool is_nvdimm;
> > > -    AcpiSratMemoryAffinity *numamem;
> > > -    MemoryAffinityFlags flags;
> > > -
> > > -    for (cur = base, info = info_list;
> > > -         cur < end;
> > > -         cur += size, info = info->next) {
> > > -        numamem = acpi_data_push(table_data, sizeof *numamem);
> > > -
> > > -        if (!info) {
> > > -            /*
> > > -             * Entry is required for Windows to enable memory hotplug in 
> > > OS
> > > -             * and for Linux to enable SWIOTLB when booted with less than
> > > -             * 4G of RAM. Windows works better if the entry sets 
> > > proximity
> > > -             * to the highest NUMA node in the machine at the end of the
> > > -             * reserved space.
> > > -             * Memory devices may override proximity set by this entry,
> > > -             * providing _PXM method if necessary.
> > > -             */
> > > -            build_srat_memory(numamem, end - 1, 1, default_node,
> > > -                              MEM_AFFINITY_HOTPLUGGABLE | 
> > > MEM_AFFINITY_ENABLED);
> > > -            break;
> > > -        }
> > > -
> > > -        mi = info->value;
> > > -        is_nvdimm = (mi->type == MEMORY_DEVICE_INFO_KIND_NVDIMM);
> > > -        di = !is_nvdimm ? mi->u.dimm.data : mi->u.nvdimm.data;
> > > -
> > > -        if (cur < di->addr) {
> > > -            build_srat_memory(numamem, cur, di->addr - cur, default_node,
> > > -                              MEM_AFFINITY_HOTPLUGGABLE | 
> > > MEM_AFFINITY_ENABLED);
> > > -            numamem = acpi_data_push(table_data, sizeof *numamem);
> > > -        }
> > > -
> > > -        size = di->size;
> > > -
> > > -        flags = MEM_AFFINITY_ENABLED;
> > > -        if (di->hotpluggable) {
> > > -            flags |= MEM_AFFINITY_HOTPLUGGABLE;
> > > -        }
> > > -        if (is_nvdimm) {
> > > -            flags |= MEM_AFFINITY_NON_VOLATILE;
> > > -        }
> > > -
> > > -        build_srat_memory(numamem, di->addr, size, di->node, flags);
> > > -    }
> > > -
> > > -    qapi_free_MemoryDeviceInfoList(info_list);
> > > -}
> > > -
> > >  static void
> > >  build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
> > >  {
> > > @@ -2414,10 +2356,19 @@ build_srat(GArray *table_data, BIOSLinker 
> > > *linker, MachineState *machine)
> > >          build_srat_memory(numamem, 0, 0, 0, MEM_AFFINITY_NOFLAGS);
> > >      }
> > >  
> > > +    /*
> > > +     * Entry is required for Windows to enable memory hotplug in OS
> > > +     * and for Linux to enable SWIOTLB when booted with less than
> > > +     * 4G of RAM. Windows works better if the entry sets proximity
> > > +     * to the highest NUMA node in the machine.
> > > +     * Memory devices may override proximity set by this entry,
> > > +     * providing _PXM method if necessary.
> > > +     */
> > >      if (hotplugabble_address_space_size) {
> > > -        build_srat_hotpluggable_memory(table_data, 
> > > machine->device_memory->base,
> > > -                                       hotplugabble_address_space_size,
> > > -                                       pcms->numa_nodes - 1);
> > > +        numamem = acpi_data_push(table_data, sizeof *numamem);
> > > +        build_srat_memory(numamem, machine->device_memory->base,
> > > +                          hotplugabble_address_space_size, 
> > > pcms->numa_nodes - 1,
> > > +                          MEM_AFFINITY_HOTPLUGGABLE | 
> > > MEM_AFFINITY_ENABLED);
> > >      }
> > >  
> > >      build_header(linker, table_data,
> > >   
> > 
> > So this reverts both 10efd7e108 (which only moved the regression around)
> > and the earlier 848a1cc1e (which had introduced the regression in the
> > first place).
> > 
> > If I understand correctly, the issue that 848a1cc1e was meant to solve
> > was that "-device nvdimm,node=..." could be passed by the user such that
> > it lead to "proximity domain mismatch". (What was the higher-level goal
> > with that BTW?)
> > 
> > However, that mismatch could as well be avoided by simply not passing
> > such "node=..." properties. Is that right?
> >
> > If so, should we perhaps add another patch (temporarily), beyond this
> > revert, that identifies the misconfig in question, and prints a warning
> > about it?
> not exactly, before the patch node=... influenced only _PXM which is supposed
> to be evaluated after SRAT table and override whatever was in static table.
> 
> The patch, as I understood it, was about ACPI spec compliance where nvdimm SPA
> in NFIT table should have a corresponding entry in SRAT table with
> MEM_AFFINITY_NON_VOLATILE flag set.
> Whether it solves any real issues aren't known to me and typically for
> Intel contributed patches, author's email doesn't exists anymore so ...
> And even if it fixes some new nvdimm issue, I'd rather have it broken
> than regress memory hotplug that worked fine so far and wait for another
> nvdimm patch that won't break existing features.
> 
> > Anyway the revert seems justified to me.
> I've killed quite enough time trying to find out a way to keep
> everyone happy, but alas it's time to give up and let interested
> party to deal with regressions while introducing new stuff for
> nvdimm, hence this revert.

If the original author isn't available, I agree the best option
is to revert it to avoid the regression by now.

Reviewed-by: Eduardo Habkost <address@hidden>

However, have you considered keeping adding separate entries for
NVDIMM devices only (so we follow the spec), but add a single
(numa_nodes-1, MEM_AFFINITY_HOTPLUGGABLE|MEM_AFFINITY_ENABLED)
entry to the rest?

This way both use cases should still work as expected.  If
Windows break if using NVDIMM + Memory Hotplug at the same time,
maybe that's just a Windows bug we can't work around.

-- 
Eduardo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]