qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v4] s390: diagnose 318 info reset and migration


From: Christian Borntraeger
Subject: Re: [Qemu-devel] [PATCH v4] s390: diagnose 318 info reset and migration support
Date: Tue, 14 May 2019 11:04:12 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1


On 14.05.19 11:00, David Hildenbrand wrote:
> On 14.05.19 10:56, Christian Borntraeger wrote:
>>
>>
>> On 14.05.19 10:50, David Hildenbrand wrote:
>>> On 14.05.19 10:37, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 14.05.19 09:28, David Hildenbrand wrote:
>>>>>>>> But that can be tested using the runability information if I am not 
>>>>>>>> wrong.
>>>>>>>
>>>>>>> You mean the cpu level information, right?
>>>>>
>>>>> Yes, query-cpu-definition includes for each model runability information
>>>>> via "unavailable-features" (valid under the started QEMU machine).
>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>> and others that we have today.
>>>>>>>>>
>>>>>>>>> So yes, I think this would be acceptable.  
>>>>>>>>
>>>>>>>> I guess it is acceptable yes. I doubt anybody uses that many CPUs in
>>>>>>>> production either way. But you never know.
>>>>>>>
>>>>>>> I think that using that many cpus is a more uncommon setup, but I still
>>>>>>> think that having to wait for actual failure
>>>>>>
>>>>>> That can happen all the time today. You can easily say z14 in the xml 
>>>>>> when 
>>>>>> on a zEC12. Only at startup you get the error. The question is really:
>>>>>
>>>>> "-smp 248 -cpu host" will no longer work, while e.g. "-smp 248 -cpu z12"
>>>>> will work. Actually, even "-smp 248" will no longer work on affected
>>>>> machines.
>>>>>
>>>>> That is why wonder if it is better to disable the feature and print a
>>>>> warning. Similar to CMMA, where want want to tolerate when CMMA is not
>>>>> possible in the current environment (huge pages).
>>>>>
>>>>> "Diag318 will not be enabled because it is not compatible with more than
>>>>> 240 CPUs".
>>>>>
>>>>> However, I still think that implementing support for more than one SCLP
>>>>> response page is the best solution. Guests will need adaptions for > 240
>>>>> CPUs with Diag318, but who cares? Existing setups will continue to work.
>>>>>
>>>>> Implementing that SCLP thingy will avoid any warnings and any errors. It
>>>>> just works from the QEMU perspective.
>>>>>
>>>>> Is implementing this realistic?
>>>>
>>>> Yes it is but it will take time. I will try to get this rolling. To make
>>>> progress on the diag318 thing, can we error on startup now and simply
>>>> remove that check when when have implemented a larger sccb? If we would
>>>> now do all kinds of "change the max number games" would be harder to "fix".
>>>
>>>
>>> Another idea for temporary handling: Simply only indicate 240 CPUs to
>>> the guest if the response does not fit into a page. Once we have that
>>> SCLP thingy, this will be fixed. Guest migration back and forth should
>>> work, as the VCPUs are fully functional (and initially always stopped),
>>> the guest will simply not be able to detect them via SCLP when booting
>>> up, and therefore not use them.
>>
>> Yes, that looks like a good temporary solution. In fact if the guest relies
>> on simply probing it could even make use of the additional CPUs. Its just
>> the sclp response that is limited to 240 (or make it 247?)
> 
> I think the limiting factor was more than a single CPU, but I don't
> recall. We can do the math again and come up with the right number.

I think We need 8 byte per CPU. With byte 134 we should still be ok with
247. Collin can do the math in the patch description.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]