|
From: | BALATON Zoltan |
Subject: | Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC |
Date: | Fri, 21 Feb 2020 20:52:35 +0100 (CET) |
User-agent: | Alpine 2.22 (BSF 395 2020-01-19) |
On Fri, 21 Feb 2020, Peter Maydell wrote:
On Fri, 21 Feb 2020 at 18:04, BALATON Zoltan <address@hidden> wrote:On Fri, 21 Feb 2020, Peter Maydell wrote:I think that is the wrong approach. Enabling use of the host FPU should not affect the accuracy of the emulation, which should remain bitwise-correct. We should only be using the host FPU to the extent that we can do that without discarding accuracy. As far as I'm aware that's how the hardfloat support for other guest CPUs that use it works.I don't know of a better approach. Please see section 4.2.2 Floating-Point Status and Control Register on page 124 in this document: https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0 especially the definition of the FR and FI bits and tell me how can we emulate these accurately and use host FPU.I don't know much about PPC, but if you can't emulate the guest architecture accurately with the host FPU, then don't use the host FPU. We used to have a kind of 'hardfloat'
I don't know if it's possible or not to emulate these accurately and use the FPU but nobody did it for QEMU so far. But if someone knows a way please speak up then we can try to implement it. Unfortunately this would require more detailed knowledge about different FPU implementations (at least X86_64, ARM and PPC that are the mostly used platforms) than what I have or willing to spend time to learn.
support that was fast but inaccurate, but it was a mess because it meant that most guest code sort of worked but some guest code would confusingly misbehave. Deliberately not correctly emulating the guest CPU/FPU behaviour is not something I want us to return to. You're right that sometimes you can't get both speed and accuracy; other emulators (and especially ones which are trying to emulate games consoles) may choose to prefer speed over accuracy. For QEMU we prefer to choose accuracy over speed in this area.
OK, then how about keeping the default accurate but allow to opt in to use FPU even if it's known to break some bits for workloads where users would need speed over accuracy and would be happy to live with the limitation. Note that i've found that just removing the define that disables hardfloat for PPC target makes VMX vector instructions faster while normal FPU is a little slower without any other changes so disabling hardfloat already limits performance for guests using VMX even when not using the FPU for cases when it would cause inaccuracy. If you say we want accuracy and don't care about speed, then just don't disable hardfloat as it helps at least VMX and then we can add option to allow the user to say we can use hardfloat even if it's inaccurate then they can test their workload and decide for themselves.
Regards, BALATON Zoltan
[Prev in Thread] | Current Thread | [Next in Thread] |