qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [Qemu-devel] [PATCH v2 3/5] numa: make sure that all cpus


From: Eduardo Habkost
Subject: Re: [Qemu-ppc] [Qemu-devel] [PATCH v2 3/5] numa: make sure that all cpus in has has_node_id set if numa is enabled
Date: Mon, 29 May 2017 10:22:41 -0300
User-agent: Mutt/1.8.0 (2017-02-23)

On Mon, May 29, 2017 at 01:45:36PM +0200, Igor Mammedov wrote:
> On Fri, 26 May 2017 13:06:30 -0300
> Eduardo Habkost <address@hidden> wrote:
> 
> > On Tue, May 23, 2017 at 04:38:48PM +0200, Igor Mammedov wrote:
> > > It fixes/add missing _PXM object for non mapped CPU (x86)
> > > and missing fdt node (virt-arm).
> > > 
> > > It ensures that possible_cpus contains complete mapping if
> > > numa is enabled by the time machine_init() is executed.
> > > 
> > > As result non completely mapped CPUs:
> > >  1) appear in ACPI/fdt blobs
> > >  2) QMP query-hotpluggable-cpus command shows bound nodes for such CPUs
> > >  3) allows to drop checks for has_node_id in numa only code,
> > >    reducing number of invariants incomplete mapping could produce
> > >  4) moves fixup/implicit node init from runtime numa_cpu_pre_plug()
> > >    (when CPU object is created) to machine_numa_finish_init() which
> > >    helps to fix [1, 2] and make possible_cpus complete source
> > >    of numa mapping available even before CPUs are created.
> > > 
> > > Signed-off-by: Igor Mammedov <address@hidden>  
> > 
> > Code that will be affected, but will keep the same behavior:
> > 
> > * pre_plug handler when node-id is omitted on device_add: it
> >   already sets node_id=0 if !has_node_id
> >   * However, pre_plug behavior is being changed when node-id is
> >     set. See below.
> > * build_srat() (both x86 and arm): already sets node_id = 0 if
> >   !has_node_id
> > * bochs_bios_init(): already uses g_new0(), so already had
> >   omitted entries set to 0.
> > 
> > Affected code that will change behavior with this patch:
> > 
> > * pre_plug handler when node-id is set:
> >   * See my reply to patch 1/5.  This patch restores the old
> >     behavior that rejected node-id != 0 on omitted CPUs.
> I'll look into it.
> 
> > * ACPI _PXM initialization
> > * arm/virt.c: numa-node-id will now be set for all CPUs
> > * QMP/HMP query-hotpluggable-cpus output
> >   * Desirable change, as now the QMP command will reflect what
> >     the guest is really seeing
> > 
> > Due to the _PXM and FDT changes, I have the same objection I have
> > for patch 4/5: if we are going to break user expectations when
> > using incomplete NUMA mappings, let's do that only once (when we
> > start rejecting incomplete NUMA mappings).
> 
> FDT/PXM change is bugfix to firmware that I insist on.
> Firmware should properly describe all CPUs instead of repeating
> partial CPU mapping nonsense.
> 
> Wrt _PXM change see
> (27111931 acpi: provide _PXM method for CPU devices if QEMU is started numa 
> enabled)
> provided we keep 0 node fallback by dropping 4/5
> linux guest will behave the same as without _PXM.

If it is a bug fix, then I agree with it.  Also, this doesn't
change the topology of the virtual hardware, so it shouldn't
break user expectations anyway.

> 
> And again there wasn't any compat code around that
> and guest saw node-id change when it's been restarted.
> 
> Wrt FDT change, it's the same and I also talked with
> Drew about it, his reply was that for ARM target
> they don't care much about NUMA as not all pieces are
> there to make pinning work on host side yet.
> 
> PS:
> There wasn't any promise made to keep firmware side
> bug compatible or versioned so users should not have
> expectations about it in the first place.

You are right about this.

> 
> Pls don't push for bug compat stuff in the firmware
> as it will rise false expectations (at least until
> we decide to care about it by default).

I won't.  Patch 4/5 is different because it makes firmware
describe different hardware.  But this one is firmware-only, so
you are right.

> 
> (I'd use firmware compat stuff only in cases where
> it breaks guest horribly, i.e. crash/fail boot)
> 
> 
> > A few extra comments about the has_node_id checks below:
> > 
> > > ---
> > >  hw/arm/virt-acpi-build.c |  4 +---
> > >  hw/core/machine.c        | 16 ++++++++++------
> > >  hw/i386/acpi-build.c     |  3 +--
> > >  hw/i386/pc.c             |  4 +---
> > >  numa.c                   |  7 +------
> > >  5 files changed, 14 insertions(+), 20 deletions(-)
> > > 
> > > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> > > index e585206..977a794 100644
> > > --- a/hw/arm/virt-acpi-build.c
> > > +++ b/hw/arm/virt-acpi-build.c
> > > @@ -496,12 +496,10 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
> > > VirtMachineState *vms)
> > >      srat->reserved1 = cpu_to_le32(1);
> > >  
> > >      for (i = 0; i < cpu_list->len; ++i) {
> > > -        int node_id = cpu_list->cpus[i].props.has_node_id ?
> > > -            cpu_list->cpus[i].props.node_id : 0;
> > >          core = acpi_data_push(table_data, sizeof(*core));
> > >          core->type = ACPI_SRAT_PROCESSOR_GICC;
> > >          core->length = sizeof(*core);
> > > -        core->proximity = cpu_to_le32(node_id);
> > > +        core->proximity = cpu_to_le32(cpu_list->cpus[i].props.node_id);
> > >          core->acpi_processor_uid = cpu_to_le32(i);
> > >          core->flags = cpu_to_le32(1);
> > >      }
> > > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > > index aaf3cff..964b75d 100644
> > > --- a/hw/core/machine.c
> > > +++ b/hw/core/machine.c
> > > @@ -720,19 +720,23 @@ static void machine_numa_finish_init(MachineState 
> > > *machine)
> > >          const CPUArchId *cpu_slot = &possible_cpus->cpus[i];
> > >  
> > >          if (!cpu_slot->props.has_node_id) {
> > > -            if (default_mapping) {
> > > -                /* fetch default mapping from board and enable it */
> > > -                CpuInstanceProperties props = cpu_slot->props;
> > > -                props.has_node_id = true;
> > > -                machine_set_cpu_numa_node(machine, &props, &error_fatal);
> > > -            } else {
> > > +            /* fetch default mapping from board and enable it */
> > > +            CpuInstanceProperties props = cpu_slot->props;
> > > +
> > > +            if (!default_mapping) {
> > >                  /* record slots with not set mapping,
> > >                   * TODO: make it hard error in future */
> > >                  char *cpu_str = cpu_slot_to_string(cpu_slot);
> > >                  g_string_append_printf(s, "%sCPU %d [%s]",
> > >                                         s->len ? ", " : "", i, cpu_str);
> > >                  g_free(cpu_str);
> > > +
> > > +                /* non mapped cpus used to fallback to node 0 */
> > > +                props.node_id = 0;
> > >              }
> > > +
> > > +            props.has_node_id = true;
> > > +            machine_set_cpu_numa_node(machine, &props, &error_fatal);
> > >          }
> > >      }
> > >      if (s->len) {
> > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > index afcadac..873880d 100644
> > > --- a/hw/i386/acpi-build.c
> > > +++ b/hw/i386/acpi-build.c
> > > @@ -2335,8 +2335,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
> > > MachineState *machine)
> > >      srat->reserved1 = cpu_to_le32(1);
> > >  
> > >      for (i = 0; i < apic_ids->len; i++) {
> > > -        int node_id = apic_ids->cpus[i].props.has_node_id ?
> > > -            apic_ids->cpus[i].props.node_id : 0;
> > > +        int node_id = apic_ids->cpus[i].props.node_id;  
> > 
> > What about an assert(props.has_node_id) here?
> I considered it but dropped idea, as it's mostly useless
> since build_srat() is called only when numa is enabled.

I just suggested it because I would be more confident that the
new code that ensures has_node_id is set for all slots is working
and won't break if we refactor it.  But not a big deal.

> 
> and sticking asserts in every place just in case doesn't
> seem to me like good idea as it would lead to absurd 
> asserts after every line.
> 
> Considering the new code behaves the same in case !has_node_id
> I don't see any reason to put assert here, the same goes
> for 'Ditto" comment in the next hunk.
> 
> But if you insist I can' add assert().

I won't mind if you decide to not add it.

> 
[...]
> > >      }
> > >      for (i = 0; i < nb_numa_nodes; i++) {
> > >          numa_fw_cfg[pcms->apic_id_limit + 1 + i] =
> > > diff --git a/numa.c b/numa.c
> > > index e30702e..4ef6dea 100644
> > > --- a/numa.c
> > > +++ b/numa.c
> > > @@ -513,17 +513,12 @@ void numa_cpu_pre_plug(const CPUArchId *slot, 
> > > DeviceState *dev, Error **errp)
> > >      int node_id = object_property_get_int(OBJECT(dev), "node-id", 
> > > &error_abort);
> > >  
> > >      if (node_id == CPU_UNSET_NUMA_NODE_ID) {
> > > -        /* by default CPUState::numa_node was 0 if it wasn't set 
> > > explicitly
> > > -         * TODO: make it error when incomplete numa mapping support is 
> > > removed
> > > -         */
> > > -        node_id = 0;
> > > -
> > >          /* due to bug in libvirt, it doesn't pass node-id from props on
> > >           * device_add as expected, so we have to fix it up here */
> > >          if (slot->props.has_node_id) {
> > >              node_id = slot->props.node_id;
> > > +            object_property_set_int(OBJECT(dev), node_id, "node-id", 
> > > errp);
> > >          }
> > > -        object_property_set_int(OBJECT(dev), node_id, "node-id", errp);  
> > 
> > Why not removing the has_node_id check completely, here?
> because numa_cpu_pre_plug() is called unconditionally
> As alternative, I can replace has_node_id check with assert
> and do following from callers:
> 
>  if (nb_numa_nodes) {
>      numa_cpu_pre_plug()
>  }
> 
> 
> > In case this patch get respun, I suggest also removing the
> > has_node_id check at: build_cpus_aml() _PXM code.
> that would be wrong, build_cpus_aml() is called for both
> numa and non numa use-cases, there shouldn't be _PXM
> object if numa is not enabled, hence the check.
> 
>  

Oh, I forgot about the non-NUMA cases.  You're right.


-- 
Eduardo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]