[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes
From: |
Cédric Le Goater |
Subject: |
Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes |
Date: |
Mon, 29 Mar 2021 17:32:26 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0 |
On 3/29/21 6:20 AM, David Gibson wrote:
> On Thu, Mar 25, 2021 at 09:56:04AM +0100, Cédric Le Goater wrote:
>> On 3/25/21 3:10 AM, David Gibson wrote:
>>> On Tue, Mar 23, 2021 at 02:21:33PM -0300, Daniel Henrique Barboza wrote:
>>>>
>>>>
>>>> On 3/22/21 10:03 PM, David Gibson wrote:
>>>>> On Fri, Mar 19, 2021 at 03:34:52PM -0300, Daniel Henrique Barboza wrote:
>>>>>> Kernel commit 4bce545903fa ("powerpc/topology: Update
>>>>>> topology_core_cpumask") cause a regression in the pseries machine when
>>>>>> defining certain SMP topologies [1]. The reasoning behind the change is
>>>>>> explained in kernel commit 4ca234a9cbd7 ("powerpc/smp: Stop updating
>>>>>> cpu_core_mask"). In short, cpu_core_mask logic was causing troubles with
>>>>>> large VMs with lots of CPUs and was changed by cpu_cpu_mask because, as
>>>>>> far as the kernel understanding of SMP topologies goes, both masks are
>>>>>> equivalent.
>>>>>>
>>>>>> Further discussions in the kernel mailing list [2] shown that the
>>>>>> powerpc kernel always considered that the number of sockets were equal
>>>>>> to the number of NUMA nodes. The claim is that it doesn't make sense,
>>>>>> for Power hardware at least, 2+ sockets being in the same NUMA node. The
>>>>>> immediate conclusion is that all SMP topologies the pseries machine were
>>>>>> supplying to the kernel, with more than one socket in the same NUMA node
>>>>>> as in [1], happened to be correctly represented in the kernel by
>>>>>> accident during all these years.
>>>>>>
>>>>>> There's a case to be made for virtual topologies being detached from
>>>>>> hardware constraints, allowing maximum flexibility to users. At the same
>>>>>> time, this freedom can't result in unrealistic hardware representations
>>>>>> being emulated. If the real hardware and the pseries kernel don't
>>>>>> support multiple chips/sockets in the same NUMA node, neither should we.
>>>>>>
>>>>>> Starting in 6.0.0, all sockets must match an unique NUMA node in the
>>>>>> pseries machine. qtest changes were made to adapt to this new
>>>>>> condition.
>>>>>
>>>>> Oof. I really don't like this idea. It means a bunch of fiddly work
>>>>> for users to match these up, for no real gain. I'm also concerned
>>>>> that this will require follow on changes in libvirt to not make this a
>>>>> really cryptic and irritating point of failure.
>>>>
>>>> Haven't though about required Libvirt changes, although I can say that
>>>> there
>>>> will be some amount to be mande and it will probably annoy existing users
>>>> (everyone that has a multiple socket per NUMA node topology).
>>>>
>>>> There is not much we can do from the QEMU layer aside from what I've
>>>> proposed
>>>> here. The other alternative is to keep interacting with the kernel folks to
>>>> see if there is a way to keep our use case untouched.
>>>
>>> Right. Well.. not necessarily untouched, but I'm hoping for more
>>> replies from Cédric to my objections and mpe's. Even with sockets
>>> being a kinda meaningless concept in PAPR, I don't think tying it to
>>> NUMA nodes makes sense.
>>
>> I did a couple of replies in different email threads but maybe not
>> to all. I felt it was going nowhere :/ Couple of thoughts,
>
> I think I saw some of those, but maybe not all.
>
>> Shouldn't we get rid of the socket concept, die also, under pseries
>> since they don't exist under PAPR ? We only have numa nodes, cores,
>> threads AFAICT.
>
> Theoretically, yes. I'm not sure it's really practical, though, since
> AFAICT, both qemu and the kernel have the notion of sockets (though
> not dies) built into generic code.
Yes. But, AFAICT, these topology notions have not reached "arch/powerpc"
and PPC Linux only has a NUMA node id, on pseries and powernv.
> It does mean that one possible approach here - maybe the best one - is
> to simply declare that sockets are meaningless under, so we simply
> don't expect what the guest kernel reports to match what's given to
> qemu.
>
> It'd be nice to avoid that if we can: in a sense it's just cosmetic,
> but it is likely to surprise and confuse people.
>
>> Should we diverged from PAPR and add extra DT properties "qemu,..." ?
>> There are a couple of places where Linux checks for the underlying
>> hypervisor already.
>>
>>>> This also means that
>>>> 'ibm,chip-id' will probably remain in use since it's the only place where
>>>> we inform cores per socket information to the kernel.
>>>
>>> Well.. unless we can find some other sensible way to convey that
>>> information. I haven't given up hope for that yet.
>>
>> Well, we could start by fixing the value in QEMU. It is broken
>> today.
>
> Fixing what value, exactly?
The value of the "ibm,chip-id" since we are keeping the property under
QEMU.
>> This is all coming from some work we did last year to evaluate our HW
>> (mostly for XIVE) on 2s, 4s, 16s systems on baremetal, KVM and PowerVM.
>> We saw some real problems because Linux did not have a clear view of the
>> topology. See the figures here :
>>
>> http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210303174857.1760393-9-clg@kaod.org/
>>
>> The node id is a key parameter for system resource management, memory
>> allocation, interrupt affinity, etc. Linux scales much better if used
>> correctly.
>
> Well, sure. And we have all the ibm,associativity stuff to convey the
> node ids to the guest (which has its own problems, but not that are
> relevant here). What's throwing me is why getting node IDs correct
> has anything to do with socket numbers.
- [PATCH 0/2] pseries: SMP sockets must match NUMA nodes, Daniel Henrique Barboza, 2021/03/19
- [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes, Daniel Henrique Barboza, 2021/03/19
- Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes, David Gibson, 2021/03/22
- Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes, Daniel Henrique Barboza, 2021/03/23
- Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes, David Gibson, 2021/03/24
- Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes, Cédric Le Goater, 2021/03/25
- Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes, Daniel Henrique Barboza, 2021/03/25
- Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes, David Gibson, 2021/03/29
- Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes,
Cédric Le Goater <=
- Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes, Daniel Henrique Barboza, 2021/03/29
- Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes, Igor Mammedov, 2021/03/29
- Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes, David Gibson, 2021/03/30
- Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes, Michael Ellerman, 2021/03/31
- Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes, Cédric Le Goater, 2021/03/31
- Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes, David Gibson, 2021/03/31
- Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes, Cédric Le Goater, 2021/03/31
- Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes, Daniel Henrique Barboza, 2021/03/31
- Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes, Cédric Le Goater, 2021/03/31
- Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes, David Gibson, 2021/03/31