[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [PATCH V13 09/13] NUMA: set guest numa nodes memory policy
From: |
Wanlong Gao |
Subject: |
[Qemu-devel] [PATCH V13 09/13] NUMA: set guest numa nodes memory policy |
Date: |
Tue, 17 Sep 2013 11:16:21 +0800 |
Set the guest numa nodes memory policies using the mbind(2)
system call node by node.
After this patch, we are able to set guest nodes memory policies
through the QEMU options, this arms to solve the guest cross
nodes memory access performance issue.
And as you all know, if PCI-passthrough is used,
direct-attached-device uses DMA transfer between device and qemu process.
All pages of the guest will be pinned by get_user_pages().
KVM_ASSIGN_PCI_DEVICE ioctl
kvm_vm_ioctl_assign_device()
=>kvm_assign_device()
=> kvm_iommu_map_memslots()
=> kvm_iommu_map_pages()
=> kvm_pin_pages()
So, with direct-attached-device, all guest page's page count will be +1 and
any page migration will not work. AutoNUMA won't too.
So, we should set the guest nodes memory allocation policies before
the pages are really mapped.
Signed-off-by: Andre Przywara <address@hidden>
Signed-off-by: Wanlong Gao <address@hidden>
---
numa.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 86 insertions(+)
diff --git a/numa.c b/numa.c
index da4dbbd..915a67a 100644
--- a/numa.c
+++ b/numa.c
@@ -27,6 +27,16 @@
#include "qapi-visit.h"
#include "qapi/opts-visitor.h"
#include "qapi/dealloc-visitor.h"
+#include "exec/memory.h"
+
+#ifdef __linux__
+#include <sys/syscall.h>
+#ifndef MPOL_F_RELATIVE_NODES
+#define MPOL_F_RELATIVE_NODES (1 << 14)
+#define MPOL_F_STATIC_NODES (1 << 15)
+#endif
+#endif
+
QemuOptsList qemu_numa_opts = {
.name = "numa",
.implied_opt_name = "type",
@@ -228,6 +238,75 @@ void set_numa_nodes(void)
}
}
+#ifdef __linux__
+static int node_parse_bind_mode(unsigned int nodeid)
+{
+ int bind_mode;
+
+ switch (numa_info[nodeid].policy) {
+ case NUMA_NODE_POLICY_DEFAULT:
+ case NUMA_NODE_POLICY_PREFERRED:
+ case NUMA_NODE_POLICY_MEMBIND:
+ case NUMA_NODE_POLICY_INTERLEAVE:
+ bind_mode = numa_info[nodeid].policy;
+ break;
+ default:
+ bind_mode = NUMA_NODE_POLICY_DEFAULT;
+ return bind_mode;
+ }
+
+ bind_mode |= numa_info[nodeid].relative ?
+ MPOL_F_RELATIVE_NODES : MPOL_F_STATIC_NODES;
+
+ return bind_mode;
+}
+#endif
+
+static int set_node_mem_policy(int nodeid)
+{
+#ifdef __linux__
+ void *ram_ptr;
+ RAMBlock *block;
+ ram_addr_t len, ram_offset = 0;
+ int bind_mode;
+ int i;
+
+ QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+ if (!strcmp(block->mr->name, "pc.ram")) {
+ break;
+ }
+ }
+
+ if (block->host == NULL) {
+ return -1;
+ }
+
+ ram_ptr = block->host;
+ for (i = 0; i < nodeid; i++) {
+ len = numa_info[i].node_mem;
+ ram_offset += len;
+ }
+
+ len = numa_info[nodeid].node_mem;
+ bind_mode = node_parse_bind_mode(nodeid);
+ unsigned long *nodes = numa_info[nodeid].host_mem;
+
+ /* This is a workaround for a long standing bug in Linux'
+ * mbind implementation, which cuts off the last specified
+ * node. To stay compatible should this bug be fixed, we
+ * specify one more node and zero this one out.
+ */
+ unsigned long maxnode = find_last_bit(nodes, MAX_NODES);
+ if (syscall(SYS_mbind, ram_ptr + ram_offset, len, bind_mode,
+ nodes, maxnode + 2, 0)) {
+ perror("mbind");
+ return -1;
+ }
+#endif
+
+ return 0;
+}
+
void set_numa_modes(void)
{
CPUState *cpu;
@@ -240,4 +319,11 @@ void set_numa_modes(void)
}
}
}
+
+ for (i = 0; i < nb_numa_nodes; i++) {
+ if (set_node_mem_policy(i) == -1) {
+ fprintf(stderr,
+ "qemu: can not set host memory policy for node%d\n", i);
+ }
+ }
}
--
1.8.4.99.gd2dbd39
- [Qemu-devel] [PATCH V13 00/13] Add support for binding guest numa nodes to host numa nodes, Wanlong Gao, 2013/09/16
- [Qemu-devel] [PATCH V13 02/13] NUMA: check if the total numa memory size is equal to ram_size, Wanlong Gao, 2013/09/16
- [Qemu-devel] [PATCH V13 04/13] NUMA: convert -numa option to use OptsVisitor, Wanlong Gao, 2013/09/16
- [Qemu-devel] [PATCH V13 03/13] NUMA: Add numa_info structure to contain numa nodes info, Wanlong Gao, 2013/09/16
- [Qemu-devel] [PATCH V13 01/13] NUMA: move numa related code to new file numa.c, Wanlong Gao, 2013/09/16
- [Qemu-devel] [PATCH V13 06/13] NUMA: add "-numa mem," options, Wanlong Gao, 2013/09/16
- [Qemu-devel] [PATCH V13 05/13] NUMA: introduce NumaMemOptions, Wanlong Gao, 2013/09/16
- [Qemu-devel] [PATCH V13 08/13] NUMA: parse guest numa nodes memory policy, Wanlong Gao, 2013/09/16
- [Qemu-devel] [PATCH V13 09/13] NUMA: set guest numa nodes memory policy,
Wanlong Gao <=
- [Qemu-devel] [PATCH V13 07/13] NUMA: expand MAX_NODES from 64 to 128, Wanlong Gao, 2013/09/16
- [Qemu-devel] [PATCH V13 12/13] NUMA: add qmp command query-numa, Wanlong Gao, 2013/09/16
- [Qemu-devel] [PATCH V13 10/13] NUMA: add qmp command set-mem-policy to set memory policy for NUMA node, Wanlong Gao, 2013/09/16
- [Qemu-devel] [PATCH V13 13/13] NUMA: convert hmp command info_numa to use qmp command query_numa, Wanlong Gao, 2013/09/16
- [Qemu-devel] [PATCH V13 11/13] NUMA: add hmp command set-mem-policy, Wanlong Gao, 2013/09/16
- Re: [Qemu-devel] [PATCH V13 00/13] Add support for binding guest numa nodes to host numa nodes, Wanlong Gao, 2013/09/24