qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v4] Fixes related to processing of qemu's -numa


From: Eduardo Habkost
Subject: Re: [Qemu-devel] [PATCH v4] Fixes related to processing of qemu's -numa option
Date: Tue, 31 Jul 2012 10:28:54 -0300
User-agent: Mutt/1.5.21 (2010-09-15)

Ping? Can anybody help Vinod, here? How can we get the attention of some
maintainer, to get this pulled in?


On Wed, Jul 18, 2012 at 06:26:52PM +0000, Vinod, Chegu wrote:
> Thanks Eduardo !
> 
> Hi Anthony, If you are ok with this patch...could you pl pull these changes 
> into upstream (or) 
> suggest who I should talk to get these changes in ?
> 
> Thanks!
> Vinod
> 
> -----Original Message-----
> From: Eduardo Habkost [mailto:address@hidden 
> Sent: Wednesday, July 18, 2012 10:15 AM
> To: Vinod, Chegu
> Cc: address@hidden; address@hidden; address@hidden
> Subject: Re: [Qemu-devel] [PATCH v4] Fixes related to processing of qemu's 
> -numa option
> 
> On Mon, Jul 16, 2012 at 09:31:30PM -0700, Chegu Vinod wrote:
> > Changes since v3:
> >    - using bitmap_set() instead of set_bit() in numa_add() routine.
> >    - removed call to bitmak_zero() since bitmap_new() also zeros' the 
> > bitmap.
> >    - Rebased to the latest qemu.
> 
> Tested-by: Eduardo Habkost <address@hidden>
> Reviewed-by: Eduardo Habkost <address@hidden>
> 
> 
> > 
> > Changes since v2:
> >    - Using "unsigned long *" for the node_cpumask[].
> >    - Use bitmap_new() instead of g_malloc0() for allocation.
> >    - Don't rely on "max_cpus" since it may not be initialized
> >      before the numa related qemu options are parsed & processed.
> > 
> > Note: Continuing to use a new constant for allocation of
> >       the mask (This constant is currently set to 255 since
> >       with an 8bit APIC ID VCPUs can range from 0-254 in a
> >       guest. The APIC ID 255 (0xFF) is reserved for broadcast).
> > 
> > Changes since v1:
> > 
> >    - Use bitmap functions that are already in qemu (instead
> >      of cpu_set_t macro's from sched.h)
> >    - Added a check for endvalue >= max_cpus.
> >    - Fix to address the round-robbing assignment when
> >      cpu's are not explicitly specified.
> > -----------------------------------------------
> > 
> > v1:
> > --
> > 
> > The -numa option to qemu is used to create [fake] numa nodes and 
> > expose them to the guest OS instance.
> > 
> > There are a couple of issues with the -numa option:
> > 
> > a) Max VCPU's that can be specified for a guest while using
> >    the qemu's -numa option is 64. Due to a typecasting issue
> >    when the number of VCPUs is > 32 the VCPUs don't show up
> >    under the specified [fake] numa nodes.
> > 
> > b) KVM currently has support for 160VCPUs per guest. The
> >    qemu's -numa option has only support for upto 64VCPUs
> >    per guest.
> > This patch addresses these two issues.
> > 
> > Below are examples of (a) and (b)
> > 
> > a) >32 VCPUs are specified with the -numa option:
> > 
> > /usr/local/bin/qemu-system-x86_64 \
> > -enable-kvm \
> > 71:01:01 \
> > -net tap,ifname=tap0,script=no,downscript=no \ -vnc :4
> > 
> > ...
> > Upstream qemu :
> > --------------
> > 
> > QEMU 1.1.50 monitor - type 'help' for more information
> > (qemu) info numa
> > 6 nodes
> > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 32 33 34 35 36 37 38 39 40 41 node 0 
> > size: 131072 MB node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 
> > 46 47 48 49 50 51 node 1 size: 131072 MB node 2 cpus: 20 21 22 23 24 
> > 25 26 27 28 29 52 53 54 55 56 57 58 59 node 2 size: 131072 MB node 3 
> > cpus: 30 node 3 size: 131072 MB node 4 cpus:
> > node 4 size: 131072 MB
> > node 5 cpus: 31
> > node 5 size: 131072 MB
> > 
> > With the patch applied :
> > -----------------------
> > 
> > QEMU 1.1.50 monitor - type 'help' for more information
> > (qemu) info numa
> > 6 nodes
> > node 0 cpus: 0 1 2 3 4 5 6 7 8 9
> > node 0 size: 131072 MB
> > node 1 cpus: 10 11 12 13 14 15 16 17 18 19 node 1 size: 131072 MB node 
> > 2 cpus: 20 21 22 23 24 25 26 27 28 29 node 2 size: 131072 MB node 3 
> > cpus: 30 31 32 33 34 35 36 37 38 39 node 3 size: 131072 MB node 4 
> > cpus: 40 41 42 43 44 45 46 47 48 49 node 4 size: 131072 MB node 5 
> > cpus: 50 51 52 53 54 55 56 57 58 59 node 5 size: 131072 MB
> > 
> > b) >64 VCPUs specified with -numa option:
> > 
> > /usr/local/bin/qemu-system-x86_64 \
> > -enable-kvm \
> > -cpu 
> > Westmere,+rdtscp,+pdpe1gb,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl
> > ,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+d-vnc :4
> > 
> > ...
> > 
> > Upstream qemu :
> > --------------
> > 
> > only 63 CPUs in NUMA mode supported.
> > only 64 CPUs in NUMA mode supported.
> > QEMU 1.1.50 monitor - type 'help' for more information
> > (qemu) info numa
> > 8 nodes
> > node 0 cpus: 6 7 8 9 38 39 40 41 70 71 72 73 node 0 size: 65536 MB 
> > node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 
> > 51 74 75 76 77 78 79 node 1 size: 65536 MB node 2 cpus: 20 21 22 23 24 
> > 25 26 27 28 29 52 53 54 55 56 57 58 59 60 61 node 2 size: 65536 MB 
> > node 3 cpus: 30 62 node 3 size: 65536 MB node 4 cpus:
> > node 4 size: 65536 MB
> > node 5 cpus:
> > node 5 size: 65536 MB
> > node 6 cpus: 31 63
> > node 6 size: 65536 MB
> > node 7 cpus: 0 1 2 3 4 5 32 33 34 35 36 37 64 65 66 67 68 69 node 7 
> > size: 65536 MB
> > 
> > With the patch applied :
> > -----------------------
> > 
> > QEMU 1.1.50 monitor - type 'help' for more information
> > (qemu) info numa
> > 8 nodes
> > node 0 cpus: 0 1 2 3 4 5 6 7 8 9
> > node 0 size: 65536 MB
> > node 1 cpus: 10 11 12 13 14 15 16 17 18 19 node 1 size: 65536 MB node 
> > 2 cpus: 20 21 22 23 24 25 26 27 28 29 node 2 size: 65536 MB node 3 
> > cpus: 30 31 32 33 34 35 36 37 38 39 node 3 size: 65536 MB node 4 cpus: 
> > 40 41 42 43 44 45 46 47 48 49 node 4 size: 65536 MB node 5 cpus: 50 51 
> > 52 53 54 55 56 57 58 59 node 5 size: 65536 MB node 6 cpus: 60 61 62 63 
> > 64 65 66 67 68 69 node 6 size: 65536 MB node 7 cpus: 70 71 72 73 74 75 
> > 76 77 78 79
> > 
> > Signed-off-by: Chegu Vinod <address@hidden>, Jim Hull 
> > <address@hidden>, Craig Hada <address@hidden>
> > ---
> >  cpus.c   |    3 ++-
> >  hw/pc.c  |    3 ++-
> >  sysemu.h |    3 ++-
> >  vl.c     |   43 +++++++++++++++++++++----------------------
> >  4 files changed, 27 insertions(+), 25 deletions(-)
> > 
> > diff --git a/cpus.c b/cpus.c
> > index b182b3d..acccd08 100644
> > --- a/cpus.c
> > +++ b/cpus.c
> > @@ -36,6 +36,7 @@
> >  #include "cpus.h"
> >  #include "qtest.h"
> >  #include "main-loop.h"
> > +#include "bitmap.h"
> >  
> >  #ifndef _WIN32
> >  #include "compatfd.h"
> > @@ -1145,7 +1146,7 @@ void set_numa_modes(void)
> >  
> >      for (env = first_cpu; env != NULL; env = env->next_cpu) {
> >          for (i = 0; i < nb_numa_nodes; i++) {
> > -            if (node_cpumask[i] & (1 << env->cpu_index)) {
> > +            if (test_bit(env->cpu_index, node_cpumask[i])) {
> >                  env->numa_node = i;
> >              }
> >          }
> > diff --git a/hw/pc.c b/hw/pc.c
> > index c7e9ab3..2edcc07 100644
> > --- a/hw/pc.c
> > +++ b/hw/pc.c
> > @@ -48,6 +48,7 @@
> >  #include "memory.h"
> >  #include "exec-memory.h"
> >  #include "arch_init.h"
> > +#include "bitmap.h"
> >  
> >  /* output Bochs bios info messages */  //#define DEBUG_BIOS @@ -639,7 
> > +640,7 @@ static void *bochs_bios_init(void)
> >      numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
> >      for (i = 0; i < max_cpus; i++) {
> >          for (j = 0; j < nb_numa_nodes; j++) {
> > -            if (node_cpumask[j] & (1 << i)) {
> > +            if (test_bit(i, node_cpumask[j])) {
> >                  numa_fw_cfg[i + 1] = cpu_to_le64(j);
> >                  break;
> >              }
> > diff --git a/sysemu.h b/sysemu.h
> > index 6540c79..4669348 100644
> > --- a/sysemu.h
> > +++ b/sysemu.h
> > @@ -134,9 +134,10 @@ extern uint8_t qemu_extra_params_fw[2];  extern 
> > QEMUClock *rtc_clock;
> >  
> >  #define MAX_NODES 64
> > +#define MAX_CPUMASK_BITS 255
> >  extern int nb_numa_nodes;
> >  extern uint64_t node_mem[MAX_NODES];
> > -extern uint64_t node_cpumask[MAX_NODES];
> > +extern unsigned long *node_cpumask[MAX_NODES];
> >  
> >  #define MAX_OPTION_ROMS 16
> >  typedef struct QEMUOptionRom {
> > diff --git a/vl.c b/vl.c
> > index 46248b9..35a641b 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -28,6 +28,7 @@
> >  #include <errno.h>
> >  #include <sys/time.h>
> >  #include <zlib.h>
> > +#include "bitmap.h"
> >  
> >  /* Needed early for CONFIG_BSD etc. */  #include "config-host.h"
> > @@ -240,7 +241,7 @@ QTAILQ_HEAD(, FWBootEntry) fw_boot_order = 
> > QTAILQ_HEAD_INITIALIZER(fw_boot_order
> >  
> >  int nb_numa_nodes;
> >  uint64_t node_mem[MAX_NODES];
> > -uint64_t node_cpumask[MAX_NODES];
> > +unsigned long *node_cpumask[MAX_NODES];
> >  
> >  uint8_t qemu_uuid[16];
> >  
> > @@ -951,6 +952,8 @@ static void numa_add(const char *optarg)
> >      unsigned long long value, endvalue;
> >      int nodenr;
> >  
> > +    value = endvalue = 0ULL;
> > +
> >      optarg = get_opt_name(option, 128, optarg, ',') + 1;
> >      if (!strcmp(option, "node")) {
> >          if (get_param_value(option, 128, "nodeid", optarg) == 0) { @@ 
> > -970,27 +973,22 @@ static void numa_add(const char *optarg)
> >              }
> >              node_mem[nodenr] = sval;
> >          }
> > -        if (get_param_value(option, 128, "cpus", optarg) == 0) {
> > -            node_cpumask[nodenr] = 0;
> > -        } else {
> > +        if (get_param_value(option, 128, "cpus", optarg) != 0) {
> >              value = strtoull(option, &endptr, 10);
> > -            if (value >= 64) {
> > -                value = 63;
> > -                fprintf(stderr, "only 64 CPUs in NUMA mode supported.\n");
> > +            if (*endptr == '-') {
> > +                endvalue = strtoull(endptr+1, &endptr, 10);
> >              } else {
> > -                if (*endptr == '-') {
> > -                    endvalue = strtoull(endptr+1, &endptr, 10);
> > -                    if (endvalue >= 63) {
> > -                        endvalue = 62;
> > -                        fprintf(stderr,
> > -                            "only 63 CPUs in NUMA mode supported.\n");
> > -                    }
> > -                    value = (2ULL << endvalue) - (1ULL << value);
> > -                } else {
> > -                    value = 1ULL << value;
> > -                }
> > +                endvalue = value;
> > +            }
> > +
> > +            if (!(endvalue < MAX_CPUMASK_BITS)) {
> > +                endvalue = MAX_CPUMASK_BITS - 1;
> > +                fprintf(stderr,
> > +                    "A max of %d CPUs are supported in a guest\n",
> > +                     MAX_CPUMASK_BITS);
> >              }
> > -            node_cpumask[nodenr] = value;
> > +
> > +            bitmap_set(node_cpumask[nodenr], value, 
> > + endvalue-value+1);
> >          }
> >          nb_numa_nodes++;
> >      }
> > @@ -2330,7 +2328,7 @@ int main(int argc, char **argv, char **envp)
> >  
> >      for (i = 0; i < MAX_NODES; i++) {
> >          node_mem[i] = 0;
> > -        node_cpumask[i] = 0;
> > +        node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS);
> >      }
> >  
> >      nb_numa_nodes = 0;
> > @@ -3468,8 +3466,9 @@ int main(int argc, char **argv, char **envp)
> >          }
> >  
> >          for (i = 0; i < nb_numa_nodes; i++) {
> > -            if (node_cpumask[i] != 0)
> > +            if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
> >                  break;
> > +            }
> >          }
> >          /* assigning the VCPUs round-robin is easier to implement, guest 
> > OSes
> >           * must cope with this anyway, because there are BIOSes out 
> > there in @@ -3477,7 +3476,7 @@ int main(int argc, char **argv, char **envp)
> >           */
> >          if (i == nb_numa_nodes) {
> >              for (i = 0; i < max_cpus; i++) {
> > -                node_cpumask[i % nb_numa_nodes] |= 1 << i;
> > +                set_bit(i, node_cpumask[i % nb_numa_nodes]);
> >              }
> >          }
> >      }
> > --
> > 1.7.1
> > 
> 
> -- 
> Eduardo

-- 
Eduardo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]