qemu-s390x
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [qemu-s390x] [PATCH v4] s390: diagnose 318 info reset and migration


From: Christian Borntraeger
Subject: Re: [qemu-s390x] [PATCH v4] s390: diagnose 318 info reset and migration support
Date: Tue, 14 May 2019 09:09:19 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1


On 13.05.19 13:46, Cornelia Huck wrote:
> On Mon, 13 May 2019 13:34:35 +0200
> David Hildenbrand <address@hidden> wrote:
> 
>> On 13.05.19 12:55, Christian Borntraeger wrote:
>>>
>>>
>>> On 13.05.19 11:57, David Hildenbrand wrote:  
>>>> On 13.05.19 11:51, Christian Borntraeger wrote:  
>>>>>
>>>>>
>>>>> On 13.05.19 11:40, David Hildenbrand wrote:  
>>>>>> On 13.05.19 11:34, Christian Borntraeger wrote:  
>>>>>>>
>>>>>>>
>>>>>>> On 13.05.19 10:03, David Hildenbrand wrote:  
>>>>>>>>>> +    if ((SCCB_SIZE - sizeof(ReadInfo)) / sizeof(CPUEntry) < 
>>>>>>>>>> S390_MAX_CPUS)
>>>>>>>>>> +        mc->max_cpus = S390_MAX_CPUS - 8;  
>>>>>>>>>
>>>>>>>>> This is too complicated, just set it always to 240.
>>>>>>>>>
>>>>>>>>> However, I am still not sure how to best handle this scenario. One
>>>>>>>>> solution is
>>>>>>>>>
>>>>>>>>> 1. Set it statically to 240 for machine > 4.1
>>>>>>>>> 2. Keep the old machines unmodifed
>>>>>>>>> 3. Don't indicate the CPU feature for machines <= 4.0
>>>>>>>>>
>>>>>>>>> #3 is the problematic part, as it mixes host CPU features and 
>>>>>>>>> machines.
>>>>>>>>> Bad. The host CPU model should always look the same on all machines. I
>>>>>>>>> don't like this.
>>>>>>>>>  
>>>>>>>>
>>>>>>>> FWIW, #3 is only an issue when modeling it via the CPU model, like
>>>>>>>> Christian suggested.
>>>>>>>>
>>>>>>>> I suggest the following
>>>>>>>>
>>>>>>>> 1. Set the max #cpus for 4.1 to 240 (already done)
>>>>>>>> 2. Keep it for the other machines unmodified (as suggested by Thomas)
>>>>>>>> 3. Create the layout of the SCCB depending on the machine type (to be 
>>>>>>>> done)
>>>>>>>>
>>>>>>>> If we want to model diag318 via a CPU feature (which makes sense for
>>>>>>>> migration):
>>>>>>>>
>>>>>>>> 4. Disable diag318 with a warning if used with a machine < 4.1
>>>>>>>>  
>>>>>>>
>>>>>>> I think there is a simpler solution. It is perfectly fine to fail the 
>>>>>>> startup
>>>>>>> if we cannot fulfil the cpu model. So lets just allow 248 and allow 
>>>>>>> this feature 
>>>>>>> also for older machines. And if somebody chooses both at the same time,
>>>>>>> lets fails the startup.  
>>>>>>
>>>>>> To which knob do you want to glue the layout of the SCLP response? Like
>>>>>> I described?  Do you mean instead of warning and masking the feature off
>>>>>> as I suggested, simply failing?  
>>>>>
>>>>> The sclp response will depend on the dia318 cpu model flag. If its on, 
>>>>> the sclp
>>>>> response will have it, otherwise not.
>>>>> - host-passthrough: not migration safe anyway
>>>>> - host-model: if the target has diag318 good, otherwise we reject 
>>>>> migration   
>>>>>>
>>>>>> In that case, -machine ..-4.0 -cpu host will not work on new HW with new
>>>>>> KVM. Just noting.  
>>>>>
>>>>> Only if you have 248 CPUs (which is unlikely). My point was to do that 
>>>>> for all
>>>>> machine levels.
>>>>>  
>>>>
>>>> The issue with this approach is that e.g. libvirt is not aware of this
>>>> restriction. It could query "max_cpus" and expand the host-cpu model,
>>>> but starting a guest with > 240 cpus would fail. Maybe this is acceptable. 
>>>>  
>>>
>>> As of today we do the cpu model check in the same way. libvirt actually 
>>> tries
>>> to run QEMU and handles failures.
>>>
>>> For a failure, the user still has still to use >240 CPUs in its XML. The 
>>> only downside
>>> is that libvirt will not reject this right away.
>>>
>>> During startup we would then print an error message like
>>>
>>> "The diag318 cpu feature is only supported for 240 and less CPUs."
>>>
>>> This is of similar quality as
>>> "Selected CPU GA level is too new. Maximum supported model in the 
>>> configuration: \'%s\'",
>>>   
>>
>> But that can be tested using the runability information if I am not wrong.
> 
> You mean the cpu level information, right?
> 
>>
>>> and others that we have today.
>>>
>>> So yes, I think this would be acceptable.  
>>
>> I guess it is acceptable yes. I doubt anybody uses that many CPUs in
>> production either way. But you never know.
> 
> I think that using that many cpus is a more uncommon setup, but I still
> think that having to wait for actual failure

That can happen all the time today. You can easily say z14 in the xml when 
on a zEC12. Only at startup you get the error. The question is really:
do you want to error on definition of the xml or on startup. And I think
startup is the better place here. This allows to create definitions that will
be useful in the future (pre-planning), e.g. if you know that you will update
your machine or the code soon.

> is worse than being able
> to find out beforehand. Any way to make this discoverable?




reply via email to

[Prev in Thread] Current Thread [Next in Thread]