qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes


From: Daniel Henrique Barboza
Subject: Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes
Date: Tue, 30 Mar 2021 18:33:36 -0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1



On 3/29/21 8:51 PM, Igor Mammedov wrote:
On Tue, 23 Mar 2021 12:03:58 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

On Fri, Mar 19, 2021 at 03:34:52PM -0300, Daniel Henrique Barboza wrote:
Kernel commit 4bce545903fa ("powerpc/topology: Update
topology_core_cpumask") cause a regression in the pseries machine when
defining certain SMP topologies [1]. The reasoning behind the change is
explained in kernel commit 4ca234a9cbd7 ("powerpc/smp: Stop updating
cpu_core_mask"). In short, cpu_core_mask logic was causing troubles with
large VMs with lots of CPUs and was changed by cpu_cpu_mask because, as
far as the kernel understanding of SMP topologies goes, both masks are
equivalent.

Further discussions in the kernel mailing list [2] shown that the
powerpc kernel always considered that the number of sockets were equal
to the number of NUMA nodes. The claim is that it doesn't make sense,
for Power hardware at least, 2+ sockets being in the same NUMA node. The
immediate conclusion is that all SMP topologies the pseries machine were
supplying to the kernel, with more than one socket in the same NUMA node
as in [1], happened to be correctly represented in the kernel by
accident during all these years.

There's a case to be made for virtual topologies being detached from
hardware constraints, allowing maximum flexibility to users. At the same
time, this freedom can't result in unrealistic hardware representations
being emulated. If the real hardware and the pseries kernel don't
support multiple chips/sockets in the same NUMA node, neither should we.

Starting in 6.0.0, all sockets must match an unique NUMA node in the
pseries machine. qtest changes were made to adapt to this new
condition.

Oof.  I really don't like this idea.  It means a bunch of fiddly work
for users to match these up, for no real gain.  I'm also concerned
that this will require follow on changes in libvirt to not make this a
really cryptic and irritating point of failure.

yes, it is annoying to users but we were very slowly tightening limitations
on numa path, especially if it causes problems on guest side when user
specify nonsense configs (and suspect there are few things to enforce in
generic numa code left (currently just warnings)). So it's nothing new.

Do you mind elaborating a bit on the 'nonsense configs' bit? Do you mean
configs that are unrealistic because it's uncompliant with ACPI? I always
thought that all possible SMP/NUMA topologies QEMU allows were ACPI/x86
compliant.

If both x86 and PowerPC have restrictions in common we might want to consider
putting this code in a common helper/place.



So if limit applies to new machine type it should be fine (i.e. no existing
VMs will suffer). Later on we can deprecate invalid configurations altogether
and just error out.

hmmm I'm not warning the user about the deprecated topologies in existing
guests in this patch. I guess it doesn't hurt to add a warning for existing
VMs.



Thanks,


DHB



[1] https://bugzilla.redhat.com/1934421
[2] 
https://lore.kernel.org/linuxppc-dev/daa5d05f-dbd0-05ad-7395-5d5a3d364fc6@gmail.com/

CC: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
CC: Cédric Le Goater <clg@kaod.org>
CC: Igor Mammedov <imammedo@redhat.com>
CC: Laurent Vivier <lvivier@redhat.com>
CC: Thomas Huth <thuth@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
  hw/ppc/spapr.c                 |  3 ++
  hw/ppc/spapr_numa.c            |  7 +++++
  include/hw/ppc/spapr.h         |  1 +
  tests/qtest/cpu-plug-test.c    |  4 +--
  tests/qtest/device-plug-test.c |  9 +++++-
  tests/qtest/numa-test.c        | 52 ++++++++++++++++++++++++++++------
  6 files changed, 64 insertions(+), 12 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index d56418ca29..745f71c243 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -4611,8 +4611,11 @@ DEFINE_SPAPR_MACHINE(6_0, "6.0", true);
   */
  static void spapr_machine_5_2_class_options(MachineClass *mc)
  {
+    SpaprMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
+
      spapr_machine_6_0_class_options(mc);
      compat_props_add(mc->compat_props, hw_compat_5_2, hw_compat_5_2_len);
+    smc->pre_6_0_smp_topology = true;
  }
DEFINE_SPAPR_MACHINE(5_2, "5.2", false);
diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
index 779f18b994..0ade43dd79 100644
--- a/hw/ppc/spapr_numa.c
+++ b/hw/ppc/spapr_numa.c
@@ -163,6 +163,13 @@ void spapr_numa_associativity_init(SpaprMachineState 
*spapr,
      int i, j, max_nodes_with_gpus;
      bool using_legacy_numa = spapr_machine_using_legacy_numa(spapr);
+ if (!smc->pre_6_0_smp_topology &&
+        nb_numa_nodes != machine->smp.sockets) {
+        error_report("Number of CPU sockets must be equal to the number "
+                     "of NUMA nodes");
+        exit(EXIT_FAILURE);
+    }
+
      /*
       * For all associativity arrays: first position is the size,
       * position MAX_DISTANCE_REF_POINTS is always the numa_id,
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 47cebaf3ac..98dc5d198a 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -142,6 +142,7 @@ struct SpaprMachineClass {
      hwaddr rma_limit;          /* clamp the RMA to this size */
      bool pre_5_1_assoc_refpoints;
      bool pre_5_2_numa_associativity;
+    bool pre_6_0_smp_topology;
bool (*phb_placement)(SpaprMachineState *spapr, uint32_t index,
                            uint64_t *buid, hwaddr *pio,
diff --git a/tests/qtest/cpu-plug-test.c b/tests/qtest/cpu-plug-test.c
index a1c689414b..946b9129ea 100644
--- a/tests/qtest/cpu-plug-test.c
+++ b/tests/qtest/cpu-plug-test.c
@@ -118,8 +118,8 @@ static void add_pseries_test_case(const char *mname)
      data->machine = g_strdup(mname);
      data->cpu_model = "power8_v2.0";
      data->device_model = g_strdup("power8_v2.0-spapr-cpu-core");
-    data->sockets = 2;
-    data->cores = 3;
+    data->sockets = 1;
+    data->cores = 6;
      data->threads = 1;
      data->maxcpus = data->sockets * data->cores * data->threads;
diff --git a/tests/qtest/device-plug-test.c b/tests/qtest/device-plug-test.c
index 559d47727a..dd7d8268d2 100644
--- a/tests/qtest/device-plug-test.c
+++ b/tests/qtest/device-plug-test.c
@@ -91,7 +91,14 @@ static void test_spapr_cpu_unplug_request(void)
  {
      QTestState *qtest;
- qtest = qtest_initf("-cpu power9_v2.0 -smp 1,maxcpus=2 "
+    /*
+     * Default smp settings will prioritize sockets over cores and
+     * threads, so '-smp 2,maxcpus=2' will add 2 sockets. However,
+     * the pseries machine requires a NUMA node for each socket
+     * (since 6.0.0). Specify sockets=1 to make life easier.
+     */
+    qtest = qtest_initf("-cpu power9_v2.0 "
+                        "-smp 1,maxcpus=2,threads=1,cores=2,sockets=1 "
                          "-device 
power9_v2.0-spapr-cpu-core,core-id=1,id=dev0");
/* similar to test_pci_unplug_request */
diff --git a/tests/qtest/numa-test.c b/tests/qtest/numa-test.c
index dc0ec571ca..bb13f7131b 100644
--- a/tests/qtest/numa-test.c
+++ b/tests/qtest/numa-test.c
@@ -24,9 +24,17 @@ static void test_mon_explicit(const void *data)
      QTestState *qts;
      g_autofree char *s = NULL;
      g_autofree char *cli = NULL;
+    const char *arch = qtest_get_arch();
+
+    if (g_str_equal(arch, "ppc64")) {
+        cli = make_cli(data, "-smp 8,threads=1,cores=4,sockets=2 "
+                             "-numa node,nodeid=0,memdev=ram,cpus=0-3 "
+                             "-numa node,nodeid=1,cpus=4-7");
+    } else {
+        cli = make_cli(data, "-smp 8 -numa node,nodeid=0,memdev=ram,cpus=0-3 "
+                             "-numa node,nodeid=1,cpus=4-7");
+    }
- cli = make_cli(data, "-smp 8 -numa node,nodeid=0,memdev=ram,cpus=0-3 "
-                         "-numa node,nodeid=1,cpus=4-7");
      qts = qtest_init(cli);
s = qtest_hmp(qts, "info numa");
@@ -57,10 +65,18 @@ static void test_mon_partial(const void *data)
      QTestState *qts;
      g_autofree char *s = NULL;
      g_autofree char *cli = NULL;
+    const char *arch = qtest_get_arch();
+
+    if (g_str_equal(arch, "ppc64")) {
+        cli = make_cli(data, "-smp 8,threads=1,cores=4,sockets=2 "
+                             "-numa node,nodeid=0,memdev=ram,cpus=0-1 "
+                             "-numa node,nodeid=1,cpus=4-5 ");
+    } else {
+        cli = make_cli(data, "-smp 8 "
+                             "-numa node,nodeid=0,memdev=ram,cpus=0-1 "
+                             "-numa node,nodeid=1,cpus=4-5 ");
+    }
- cli = make_cli(data, "-smp 8 "
-                   "-numa node,nodeid=0,memdev=ram,cpus=0-1 "
-                   "-numa node,nodeid=1,cpus=4-5 ");
      qts = qtest_init(cli);
s = qtest_hmp(qts, "info numa");
@@ -85,9 +101,17 @@ static void test_query_cpus(const void *data)
      QObject *e;
      QTestState *qts;
      g_autofree char *cli = NULL;
+    const char *arch = qtest_get_arch();
+
+    if (g_str_equal(arch, "ppc64")) {
+        cli = make_cli(data, "-smp 8,threads=1,cores=4,sockets=2 "
+                             "-numa node,memdev=ram,cpus=0-3 "
+                             "-numa node,cpus=4-7");
+    } else {
+        cli = make_cli(data, "-smp 8 -numa node,memdev=ram,cpus=0-3 "
+                             "-numa node,cpus=4-7");
+    }
- cli = make_cli(data, "-smp 8 -numa node,memdev=ram,cpus=0-3 "
-                         "-numa node,cpus=4-7");
      qts = qtest_init(cli);
      cpus = get_cpus(qts, &resp);
      g_assert(cpus);
@@ -177,7 +201,7 @@ static void spapr_numa_cpu(const void *data)
      QTestState *qts;
      g_autofree char *cli = NULL;
- cli = make_cli(data, "-smp 4,cores=4 "
+    cli = make_cli(data, "-smp 4,threads=1,cores=2,sockets=2 "
          "-numa node,nodeid=0,memdev=ram -numa node,nodeid=1 "
          "-numa cpu,node-id=0,core-id=0 "
          "-numa cpu,node-id=0,core-id=1 "
@@ -554,7 +578,17 @@ int main(int argc, char **argv)
g_test_init(&argc, &argv, NULL); - qtest_add_data_func("/numa/mon/cpus/default", args, test_def_cpu_split);
+    /*
+     * Starting on 6.0.0, for the pseries machine, '-smp 8' will only work
+     * if we have 8 NUMA nodes. If we specify 'smp 8,sockets=2' to match
+     * 2 NUMA nodes, the CPUs will be split as 0123/4567 instead of
+     * 0246/1357 that test_def_cpu_split expects. In short, this test is
+     * no longer valid for ppc64 in 6.0.0.
+     */
+    if (!g_str_equal(arch, "ppc64")) {
+        qtest_add_data_func("/numa/mon/cpus/default", args, 
test_def_cpu_split);
+    }
+
      qtest_add_data_func("/numa/mon/cpus/explicit", args, test_mon_explicit);
      qtest_add_data_func("/numa/mon/cpus/partial", args, test_mon_partial);
      qtest_add_data_func("/numa/qmp/cpus/query-cpus", args, test_query_cpus);





reply via email to

[Prev in Thread] Current Thread [Next in Thread]