[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [RFC PATCH 0/6] target/ppc: convert VMX instructions to u

From: BALATON Zoltan
Subject: Re: [Qemu-ppc] [RFC PATCH 0/6] target/ppc: convert VMX instructions to use TCG vector operations
Date: Tue, 11 Dec 2018 04:03:36 +0100 (CET)
User-agent: Alpine 2.21.9999 (BSF 287 2018-06-16)

On Tue, 11 Dec 2018, David Gibson wrote:
On Mon, Dec 10, 2018 at 09:54:51PM +0100, BALATON Zoltan wrote:
Yes, I don't really know what these tests use but I think "lame" test is
mostly floating point but tried with "lame_vmx" which should at least use
some vector ops and "mplayer -benchmark" test is more vmx dependent based on
my previous profiling and testing with hardfloat but I'm not sure. (When
testing these with hardfloat I've found that lame was benefiting from
hardfloat but mplayer wasn't and more VMX related functions showed up with
mplayer so I assumed it's more VMX bound.)

I should clarify here.  When I say "floating point" above, I'm not
meaning things using the regular FPU instead of the vector unit.  I'm
saying *anything* involving floating point calculations whether
they're done in the FPU or the vector unit.

OK that clarifies it. I admit I was only testing these but didn't have time to look what changed exactly.

The patches here don't convert all VMX instructions to use vector TCG
ops - they only convert a few, and those few are about using the
vector unit for integer (and logical) operations.  VMX instructions
involving floating point calculations are unaffected and will still
use soft-float.

What I've said above about lame test being more FPU and mplayer more VMX intensive probably still holds as I've retried now on a Haswell i5 and got 1-2% difference with lame_vmx and ~6% with mplayer. That's very little improvement but if only some VMX instructions should be faster then this may make sense.

These tests are not the best, maybe there are better ways to measure this but I don't know of any,

Maybe the PPC softmmu should be reviewed and optimised by someone who knows

I'm not sure there is anyone who knows it at this point.  I probably
know it as well as anybody, and the ppc32 code scares me.  It's a
crufty mess and it would be nice to clean up, but that requires
someone with enough time and interest.

At least this seems to be a big bottleneck in PPC emulation and one that's not being worked on (others like hardfloat and VMX while not finished and still lot to do but already there are some results but no one is looking at softmmu). I was just trying to direct some attention to that softmmu may also need some optimisation and hope someone would notice this. I have some interest but not much time these days and if it scares you what should I say. I don't even understand most of it so it would take a lot of time to even get how it works and what would need to be done. So I hope someone with more time or knowledge shows up and maybe at least provides some hints on what may need to be done.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]