qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 2/3] hw/i386: Add a new check to configure smp dies for EP


From: Babu Moger
Subject: Re: [PATCH v3 2/3] hw/i386: Add a new check to configure smp dies for EPYC
Date: Tue, 11 Aug 2020 16:03:58 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0


On 8/7/20 2:11 PM, Igor Mammedov wrote:
> On Fri, 7 Aug 2020 17:52:22 +0100
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
>> On Fri, Aug 07, 2020 at 11:32:51AM -0500, Babu Moger wrote:
>>> Adding a new check to warn the users to configure 'dies' when
>>> topology is numa configured. It makes it easy to build the
>>> topology for EPYC models.  
>>
>> This says you're adding a warning....
>>
>>>
>>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>>> ---
>>>  hw/i386/x86.c |    7 +++++++
>>>  1 file changed, 7 insertions(+)
>>>
>>> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
>>> index 67bee1bcb8..2a6ce56ef1 100644
>>> --- a/hw/i386/x86.c
>>> +++ b/hw/i386/x86.c
>>> @@ -138,6 +138,13 @@ void x86_cpus_init(X86MachineState *x86ms, int 
>>> default_cpu_version)
>>>  
>>>      /* Check for apicid encoding */
>>>      if (cpu_x86_use_epyc_apic_id_encoding(ms->cpu_type)) {
>>> +        if ((ms->numa_state->num_nodes > 0) &&
>>> +            ms->numa_state->num_nodes != (ms->smp.sockets * 
>>> x86ms->smp_dies)) {
>>> +            error_setg(&error_fatal, "Numa configuration requires smp 
>>> 'dies' "
>>> +                       "parameter. Configure the cpu topology properly 
>>> with "
>>> +                       "max_cpus = sockets * dies * cores * threads");  
>>
>> ...but you're actually making this a fatal error, not a warning.
>>
>> I'm not sure this is really OK. Wouldn't this mean that existing VMs
>> deployed today, risk triggering this fatal error next time they
>> are booted, or live migrated.  If it is possible someone is using
>> such a config today, I don't think we can break it.
> 
> to begin with, users shouldn't have used 'dies' with initial impl. at all.
> (it was Intel introduced option and EPYC's added very similar internal node_id
> (removed by the next patch)).
> Now we are trying to consolidate this mess and reuse dies for EPYC.
> 
> EPYC was out in the since with 5.0 (though broken), users could start a VM 
> with
> such config but that would not be correct EPYC from apicid and cpuid point of 
> view.
> Guest OS might run if it doesn't know about EPYCs or behave wierdly (sub 
> optimal|crash|whatever)
> on seeing unexpected values.
> 
> If we are hell bound on keeping bugs of initial impl, then we should keep it 
> to 5.1<=
> machine version and do the right thing for newer ones.
> Though I'm not sure we should keep broken variant around (all we would get 
> from it is
> bug reports*/complains from users with end result of their config anyways).
> I'd rather error out with clear error message so user could fix their broken 
> config.
> 
> *) there is at least one thread/bz on qemu-devel where users are trying to run
> with EPYC and pick up options combination so it would produce sensible 
> topology.


I am still not sure what is the right approach here.  I can think of
couple of options.
1. If smp_dies != num_nodes then go ahead create the configuration with as
 many smp_dies and warn(but not error out) users about the mis-configuration.

2. Introduce it as a fix based on  machine version(5.1 >) like Igor
mentioned. I am not sure how to achieve that. I can look into that.

Thanks
Babu

> 
> 
>> Regards,
>> Daniel
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]