qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v4] s390: diagnose 318 info reset and migration


From: Collin Walling
Subject: Re: [Qemu-devel] [PATCH v4] s390: diagnose 318 info reset and migration support
Date: Thu, 16 May 2019 08:42:08 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1

On 5/14/19 5:04 AM, Christian Borntraeger wrote:


On 14.05.19 11:00, David Hildenbrand wrote:
On 14.05.19 10:56, Christian Borntraeger wrote:


On 14.05.19 10:50, David Hildenbrand wrote:
On 14.05.19 10:37, Christian Borntraeger wrote:


On 14.05.19 09:28, David Hildenbrand wrote:
But that can be tested using the runability information if I am not wrong.

You mean the cpu level information, right?

Yes, query-cpu-definition includes for each model runability information
via "unavailable-features" (valid under the started QEMU machine).



and others that we have today.

So yes, I think this would be acceptable.

I guess it is acceptable yes. I doubt anybody uses that many CPUs in
production either way. But you never know.

I think that using that many cpus is a more uncommon setup, but I still
think that having to wait for actual failure

That can happen all the time today. You can easily say z14 in the xml when
on a zEC12. Only at startup you get the error. The question is really:

"-smp 248 -cpu host" will no longer work, while e.g. "-smp 248 -cpu z12"
will work. Actually, even "-smp 248" will no longer work on affected
machines.

That is why wonder if it is better to disable the feature and print a
warning. Similar to CMMA, where want want to tolerate when CMMA is not
possible in the current environment (huge pages).

"Diag318 will not be enabled because it is not compatible with more than
240 CPUs".

However, I still think that implementing support for more than one SCLP
response page is the best solution. Guests will need adaptions for > 240
CPUs with Diag318, but who cares? Existing setups will continue to work.

Implementing that SCLP thingy will avoid any warnings and any errors. It
just works from the QEMU perspective.

Is implementing this realistic?

Yes it is but it will take time. I will try to get this rolling. To make
progress on the diag318 thing, can we error on startup now and simply
remove that check when when have implemented a larger sccb? If we would
now do all kinds of "change the max number games" would be harder to "fix".


Another idea for temporary handling: Simply only indicate 240 CPUs to
the guest if the response does not fit into a page. Once we have that
SCLP thingy, this will be fixed. Guest migration back and forth should
work, as the VCPUs are fully functional (and initially always stopped),
the guest will simply not be able to detect them via SCLP when booting
up, and therefore not use them.

Yes, that looks like a good temporary solution. In fact if the guest relies
on simply probing it could even make use of the additional CPUs. Its just
the sclp response that is limited to 240 (or make it 247?)

I think the limiting factor was more than a single CPU, but I don't
recall. We can do the math again and come up with the right number.

I think We need 8 byte per CPU. With byte 134 we should still be ok with
247. Collin can do the math in the patch description.



Yes 247 fits just fine. The 240 came up as extra space in case we expand
the rscpi even more in the future.

I used the

The SCCB_SIZE - sizeof(ReadInfo)) / sizeof(CPUEntry) < S390_MAX_CPUS

calculation as an example of what we could do if the SCCB_SIZE ever
increases (it would allow older machines to retroactively allow diag318
if they can also support a larger SCCB... but thinking out loud, there
might be many more moving parts that would make this preemptive approach
too tricky to implement today).




reply via email to

[Prev in Thread] Current Thread [Next in Thread]