qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 1/1] nvdimm: add 'target-node' option


From: Liu, Jingqi
Subject: Re: [PATCH v2 1/1] nvdimm: add 'target-node' option
Date: Tue, 3 Aug 2021 13:55:16 +0800
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.12.0

Hi Igor,

On 7/29/2021 8:44 PM, Igor Mammedov wrote:
On Mon, 19 Jul 2021 10:01:53 +0800
Jingqi Liu <jingqi.liu@intel.com> wrote:

Linux kernel version 5.1 brings in support for the volatile-use of
persistent memory as a hotplugged memory region (KMEM DAX).
When this feature is enabled, persistent memory can be seen as a
separate memory-only NUMA node(s). This newly-added memory can be
selected by its unique NUMA node.

Add 'target-node' option for 'nvdimm' device to indicate this NUMA
node. It can be extended to a new node after all existing NUMA nodes.

The 'node' option of 'pc-dimm' device is to add the DIMM to an
existing NUMA node. The 'node' should be in the available NUMA nodes.
For KMEM DAX mode, persistent memory can be in a new separate
memory-only NUMA node. The new node is created dynamically.
So users use 'target-node' to control whether persistent memory
is added to an existing NUMA node or a new NUMA node.

An example of configuration is as follows.

Using the following QEMU command:
  -object 
memory-backend-file,id=nvmem1,share=on,mem-path=/dev/dax0.0,size=3G,align=2M
  -device nvdimm,id=nvdimm1,memdev=mem1,label-size=128K,targe-node=2

To list DAX devices:
  # daxctl list -u
  {
    "chardev":"dax0.0",
    "size":"3.00 GiB (3.22 GB)",
    "target_node":2,
    "mode":"devdax"
  }

To create a namespace in Device-DAX mode as a standard memory:
  $ ndctl create-namespace --mode=devdax --map=mem
To reconfigure DAX device from devdax mode to a system-ram mode:
  $ daxctl reconfigure-device dax0.0 --mode=system-ram

There are two existing NUMA nodes in Guest. After these operations,
persistent memory is configured as a separate Node 2 and
can be used as a volatile memory. This NUMA node is dynamically
created according to 'target-node'.


Well, I've looked at spec and series pointed at v1 thread,
and I don't really see a good reason to add duplicate 'target-node'
property to NVDIMM that for all practical purposes serves the same
purpose as already existing 'node' property.
The only thing that it does on top of existing 'node' property is
facilitate implicit creation of numa nodes on top of user configured
ones.

But what I really dislike, is adding implicit path to create
numa nodes from random place.

It just creates mess and and doesn't really work well because you
will have to plumb into other code to account for implicit nodes
for it to work properly. (1st thing that comes to mind is HMAT
configuration won't accept this implicit nodes, there might be
other places that will not work as expected).
So I suggest to abandon this approach and use already existing
numa CLI options to do what you need.

What you are trying to achieve can be done without this series
as QEMU allows to create memory only nodes and even empty ones
(for future hotplug) just fine.
The only thing is that one shall specify complete planned
numa topology on command line.

Here is an example that works for me:
    -machine q35,nvdimm=on \
    -m 4G,slots=4,maxmem=12G \
    -smp 4,cores=2 \
    -object memory-backend-ram,size=4G,policy=bind,host-nodes=0,id=ram-node0 \
    -numa node,nodeid=0,memdev=ram-node0
# explicitly assign all CPUs
    -numa cpu,node-id=0,socket-id=0 -numa cpu,node-id=0,socket-id=1
# and create a cpu-less node for you nvdimm
    -numa node,nodeid=1

with that you can hotplug nvdimm to with 'node=1' property set
or provide that at startup, like this:
    -object 
memory-backend-file,id=mem1,share=on,mem-path=nvdimmfile,size=3G,align=2M \
    -device nvdimm,id=nvdimm1,memdev=mem1,label-size=128K,node=1

after boot numactl -H will show:

available: 1 nodes (0)
node 0 cpus: 0 1 2 3
node 0 size: 3924 MB
node 0 free: 3657 MB
node distances:
node   0
   0:  10

and after initializing nvdimm as a dax device and
reconfiguring that to system memory it will show
as 'new' 'memory only' node

available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3
node 0 size: 3924 MB
node 0 free: 3641 MB
node 1 cpus:
node 1 size: 896 MB
node 1 free: 896 MB
node distances:
node   0   1
   0:  10  20
   1:  20  10

Signed-off-by: Jingqi Liu <jingqi.liu@intel.com>
[...]


Thanks for your comments and detailed example.
I agree with you.
I've tried these commands and it works.

Actually, I've provided similar commands to the customer before.
They just had some concerns on the need to create complete numa topology. But seems it's the only way to create memory-only nodes for future hotplugging.

I'll try to convince the customer to use it.

Thanks,
Jingqi





reply via email to

[Prev in Thread] Current Thread [Next in Thread]