Re: [Qemu-ppc] [RFC PATCH 0/6] target/ppc: convert VMX instructions to u

qemu-ppc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [RFC PATCH 0/6] target/ppc: convert VMX instructions to u

From:	BALATON Zoltan
Subject:	Re: [Qemu-ppc] [RFC PATCH 0/6] target/ppc: convert VMX instructions to use TCG vector operations
Date:	Mon, 10 Dec 2018 21:54:51 +0100 (CET)
User-agent:	Alpine 2.21.9999 (BSF 287 2018-06-16)

On Mon, 10 Dec 2018, David Gibson wrote:

On Mon, Dec 10, 2018 at 01:33:53AM +0100, BALATON Zoltan wrote:

On Fri, 7 Dec 2018, Mark Cave-Ayland wrote:

This patchset is an attempt at trying to improve the VMX (Altivec) instruction
performance by making use of the new TCG vector operations where possible.


This is very welcome, thanks for doing this.

In order to use TCG vector operations, the registers must be accessible from 
cpu_env
whilst currently they are accessed via arrays of static TCG globals. Patches 1-3
are therefore mechanical patches which introduce access helpers for FPR, AVR 
and VSR
registers using the supplied TCGv_i64 parameter.


Have you tried some benchmarks or tests to measure the impact of these
changes? I've tried the (very unscientific) benchmarks I've written about
before here:

http://lists.nongnu.org/archive/html/qemu-ppc/2018-07/msg00261.html

(which seem to use AltiVec/VMX instructions but not sure which) on mac99
with MorphOS and I could not see any performance increase. I haven't run
enough tests but results with or without this series on master were mostly
the same within a few percents, and sometimes even seen lower performance
with these patches than without. I haven't tried to find out why (no time
for that now) so can't really draw any conclusions from this. I'm also not
sure if I've actually tested what you've changed or these use instructions
that your patches don't optimise yet, or the changes I've seen were just
normal changes between runs; but I wonder if the increased number of
temporaries could result in lower performance in some cases?


What was your host machine.  IIUC this change will only improve
performance if the host tcg backend is able to implement TCG vector
ops in terms of vector ops on the host.

Tried it on i5 650 which has: sse sse2 ssse3 sse4_1 sse4_2. I assumex86_64 should be supported but not sure what are the CPU requirements.

In addition, this series only converts a subset of the integer and
logical vector instructions.  If your testcase is mostly floating
point (vectored or otherwise), it will still be softfloat and so not
see any speedup.

Yes, I don't really know what these tests use but I think "lame" test ismostly floating point but tried with "lame_vmx" which should at least usesome vector ops and "mplayer -benchmark" test is more vmx dependent basedon my previous profiling and testing with hardfloat but I'm not sure.(When testing these with hardfloat I've found that lame was benefitingfrom hardfloat but mplayer wasn't and more VMX related functions showed upwith mplayer so I assumed it's more VMX bound.)

I've tried to do some profiling again to find out what's used but I can'tget good results with the tools I have (oprofile stopped working sinceI've updated my machine and Linux perf provides results that are hard tointerpret for me, haven't tried if gprof would work now it didn't before)but I've seen some vector related helpers in the profile so at least somevector ops are used. The "helper_vperm" came up top at about 11th (notsure where is it called from), other vector helpers were lower.

I don't remember details now but previously when testing hardfloat I'vewritten this: "I've looked at vperm which came out top in one of theprofiles I've taken and on little endian hosts it has the loop backwardsand also accesses vector elements from end to front which I wonder may beenough for the compiler to not be able to optimise it? But I haven'tchecked assembly. The altivec dependent mplayer video decoding test didnot change much with hardfloat, it took 98% compared to master so likelyaltivec is dominating here." (Although this was with the PPC specificvector helpers before VMX patch so not sure if this is still relevant.)

The top 10 in profile were still related to low level memory access andMMU management stuff as I've found before:


http://lists.nongnu.org/archive/html/qemu-devel/2018-07/msg03609.html
http://lists.nongnu.org/archive/html/qemu-devel/2018-07/msg03704.html

I think implementing i2c for mac99 may help this and some otheroptimisations may also be possible but I don't know enough about these totry that.

It also looks like with --enable-debug something is always flusing tlb andblowing away tb caches so these will be top in profile and likely dominateruntime so can't really use profile to measure impact of VMX patch.Without --enable-debug I can't get call graphs so can't get usefulprofile. I think I've looked at this before as well but can't remember nowwhich check enabled by --enable-debug is responsible for constant tbcache flush and if that could be avoided. I just don't use --enable-debugsince unless need to debug somthing.

Maybe the PPC softmmu should be reviewed and optimised by someone whoknows it...


Regards,
BALATON Zoltan

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-ppc] [Qemu-devel] [RFC PATCH 1/6] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access, (continued)
- [Qemu-ppc] [RFC PATCH 5/6] target/ppc: convert VMX logical instructions to use vector operations, Mark Cave-Ayland, 2018/12/07
  - Re: [Qemu-ppc] [RFC PATCH 5/6] target/ppc: convert VMX logical instructions to use vector operations, Richard Henderson, 2018/12/10
- [Qemu-ppc] [RFC PATCH 6/6] target/ppc: convert vaddu[b, h, w, d] and vsubu[b, h, w, d] over to use vector operations, Mark Cave-Ayland, 2018/12/07
  - Re: [Qemu-ppc] [RFC PATCH 6/6] target/ppc: convert vaddu[b, h, w, d] and vsubu[b, h, w, d] over to use vector operations, Richard Henderson, 2018/12/10
- [Qemu-ppc] [RFC PATCH 3/6] target/ppc: introduce get_cpu_vsr{l, h}() and set_cpu_vsr{l, h}() helpers for VSR register access, Mark Cave-Ayland, 2018/12/07
  - Re: [Qemu-ppc] [RFC PATCH 3/6] target/ppc: introduce get_cpu_vsr{l, h}() and set_cpu_vsr{l, h}() helpers for VSR register access, Richard Henderson, 2018/12/10
    - Re: [Qemu-ppc] [Qemu-devel] [RFC PATCH 3/6] target/ppc: introduce get_cpu_vsr{l, h}() and set_cpu_vsr{l, h}() helpers for VSR register access, Mark Cave-Ayland, 2018/12/11
- Re: [Qemu-ppc] [RFC PATCH 0/6] target/ppc: convert VMX instructions to use TCG vector operations, BALATON Zoltan, 2018/12/09
  - Re: [Qemu-ppc] [RFC PATCH 0/6] target/ppc: convert VMX instructions to use TCG vector operations, David Gibson, 2018/12/09
    - Re: [Qemu-ppc] [RFC PATCH 0/6] target/ppc: convert VMX instructions to use TCG vector operations, BALATON Zoltan <=
    - Re: [Qemu-ppc] [RFC PATCH 0/6] target/ppc: convert VMX instructions to use TCG vector operations, Richard Henderson, 2018/12/10
    - Re: [Qemu-ppc] [RFC PATCH 0/6] target/ppc: convert VMX instructions to use TCG vector operations, BALATON Zoltan, 2018/12/10
    - Re: [Qemu-ppc] [RFC PATCH 0/6] target/ppc: convert VMX instructions to use TCG vector operations, David Gibson, 2018/12/10
    - Re: [Qemu-ppc] [RFC PATCH 0/6] target/ppc: convert VMX instructions to use TCG vector operations, BALATON Zoltan, 2018/12/10
    - Re: [Qemu-ppc] [Qemu-devel] [RFC PATCH 0/6] target/ppc: convert VMX instructions to use TCG vector operations, Mark Cave-Ayland, 2018/12/11
    - Re: [Qemu-ppc] [Qemu-devel] [RFC PATCH 0/6] target/ppc: convert VMX instructions to use TCG vector operations, Richard Henderson, 2018/12/11
- Re: [Qemu-ppc] [Qemu-devel] [RFC PATCH 0/6] target/ppc: convert VMX instructions to use TCG vector operations, Aleksandar Markovic, 2018/12/10
  - Re: [Qemu-ppc] [Qemu-devel] [RFC PATCH 0/6] target/ppc: convert VMX instructions to use TCG vector operations, Mark Cave-Ayland, 2018/12/11

Prev by Date: Re: [Qemu-ppc] [RFC PATCH 3/6] target/ppc: introduce get_cpu_vsr{l, h}() and set_cpu_vsr{l, h}() helpers for VSR register access
Next by Date: Re: [Qemu-ppc] [RFC PATCH 0/6] target/ppc: convert VMX instructions to use TCG vector operations
Previous by thread: Re: [Qemu-ppc] [RFC PATCH 0/6] target/ppc: convert VMX instructions to use TCG vector operations
Next by thread: Re: [Qemu-ppc] [RFC PATCH 0/6] target/ppc: convert VMX instructions to use TCG vector operations
Index(es):
- Date
- Thread