Re: [Qemu-ppc] [Qemu-devel] [PATCH 0/7] spapr: rework memory nodes

From: Alexey Kardashevskiy
Subject: Re: [Qemu-ppc] [Qemu-devel] [PATCH 0/7] spapr: rework memory nodes
Date: Tue, 17 Jun 2014 15:51:35 +1000
User-agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

On 06/17/2014 06:51 AM, Eduardo Habkost wrote:
> On Mon, Jun 16, 2014 at 06:16:29PM +1000, Alexey Kardashevskiy wrote:
>> On 06/16/2014 05:53 PM, Alexey Kardashevskiy wrote:
>>> c4177479 "spapr: make sure RMA is in first mode of first memory node"
>>> introduced regression which prevents from running guests with memoryless
>>> NUMA node#0 which may happen on real POWER8 boxes and which would make
>>> sense to debug in QEMU.
>>> This patchset aim is to fix that and also fix various code problems in
>>> memory nodes generation.
>>> These 2 patches could be merged (the resulting patch looks rather ugly):
>>> spapr: Use DT memory node rendering helper for other nodes
>>> spapr: Move DT memory node rendering to a helper
>>> Please comment. Thanks!
>> Sure I forgot to add an example of what I am trying to run without errors
>> and warnings:
>> /home/aik/qemu-system-ppc64 \
>> -enable-kvm \
>> -machine pseries \
>> -nographic \
>> -vga none \
>> -drive id=id0,if=none,file=virtimg/fc20_24GB.qcow2,format=qcow2 \
>> -device scsi-disk,id=id1,drive=id0 \
>> -m 2080 \
>> -smp 8 \
>> -numa node,nodeid=0,cpus=0-7,memory=0 \
>> -numa node,nodeid=2,cpus=0-3,mem=1040 \
>> -numa node,nodeid=4,cpus=4-7,mem=1040
> (Note: I will ignore the "cpus" argument for the discussion below.)

The example is quite bad, I should not have used same CPUs in 2 nodes.
SPAPR allows this but QEMU does not really support this and I am not
touching this now.

> I understand now that the non-contiguous node IDs are guest-visible.
> But I still would like to understand the motivations for your use case,
> to understand which solution makes more sense.

One of examples is a 2 CPUs on one die, one of CPUs is connected to memory
bus, the other is not, instead it is connected to the first CPU (via super
fast bus) and the first CPU acts as a bridge.

> If you really want 5 nodes, you just need to write this:
>   -numa node,nodeid=0,cpus=0-7,memory=0 \
>   -numa node,nodeid=1 \
>   -numa node,nodeid=2,cpus=0-3,mem=1040 \
>   -numa node,nodeid=3 \
>   -numa node,nodeid=4,cpus=4-7,mem=1040
> If you just want 3 nodes, you can just write this:
>   -numa node,nodeid=0,cpus=0-7,memory=0 \
>   -numa node,nodeid=1,cpus=0-3,mem=1040 \
>   -numa node,nodeid=4,cpus=4-7,mem=1040
> But you seem to claim you need 3 nodes with non-contiguous IDs. In that
> case, which exactly is the guest-visible difference you expect to get
> between:
>   -numa node,nodeid=0,cpus=0-7,memory=0 \
>   -numa node,nodeid=1 \
>   -numa node,nodeid=2,cpus=0-3,mem=1040 \
>   -numa node,nodeid=3 \
>   -numa node,nodeid=4,cpus=4-7,mem=1040
> and
>   -numa node,nodeid=0,cpus=0-7,memory=0 \
>   -numa node,nodeid=2,cpus=0-3,mem=1040 \
>   -numa node,nodeid=4,cpus=4-7,mem=1040
> ?
> Because your patch is making both be exactly the same, and I guess you
> don't want that (otherwise you could simply use the 5-node command-line
> above and we wouldn't need patch 7/7).

If it is canonical and kosher way of using NUMA in QEMU, ok, we can use it.
I just fail to see why we need a requirement for nodes to go consequently
here. And it confuses me as a user a bit if I can add "-numa
node,nodeid=22" (no memory, no cpus) but do not get to see it in the guest.

btw how is it supposed to work with memory hotplug? Current "-numa" does
not support gaps in memory and I would expect that we will need it. Any
plans here?


