qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] Use of host vector operations in host helper func


From: Alex Bennée
Subject: Re: [Qemu-devel] [RFC] Use of host vector operations in host helper functions
Date: Sat, 13 Sep 2014 17:02:38 +0100
User-agent: mu4e 0.9.9.5; emacs 24.3.1

Richard Henderson writes:

> Most of the time, guest vector operations are rare enough that it doesn't
> really matter that we implement them with a loop around integer operations.
>
> But for target-alpha, there's one vector comparison operation that appears in
> every guest string operation, and is used heavily enough that it's in the top
> 10 functions in the profile: cmpbge (compare bytes greater or equal).

For a helper function to top the profile is pretty impressive. I wonder
how it compares when you break it down by basic blocks?

> I did some experiments, where I rewrote the function using gcc's "generic"
> vector types and builtin operations.
>
> <snip>
>
> GCC doesn't do a half-bad job on other hosts either:
>
> aarch64:
>   b4:   4f000400        movi    v0.4s, #0x0
>   b8:   4ea01c01        mov     v1.16b, v0.16b
>   bc:   4e081c00        mov     v0.d[0], x0
>   c0:   4e081c21        mov     v1.d[0], x1
>   c4:   6e213c00        cmhs    v0.16b, v0.16b, v1.16b
>   c8:   4e083c00        mov     x0, v0.d[0]
>   cc:   9200c000        and     x0, x0, #0x101010101010101
>   d0:   aa401c00        orr     x0, x0, x0, lsr #7
>   d4:   aa403800        orr     x0, x0, x0, lsr #14
>   d8:   aa407000        orr     x0, x0, x0, lsr #28
>   dc:   53001c00        uxtb    w0, w0
>   e0:   d65f03c0        ret
>
> Of course aarch64 *does* have an 8-byte vector size that gcc knows how to use.
>  If I adjust the patch above to use it, only the first two insns are 
> eliminated
> -- surely not a measurable difference.
>
> power7:
>   ...
>   vcmpgtub 13,0,1
>   vcmpequb 0,0,1
>   xxlor 32,45,32
>   ...
>
>
> But I guess the larger question here is: how much of this should we accept?
>
> (0) Ignore this and do nothing?
>
> (1) No general infrastructure.  Special case this one insn with #ifdef 
> __SSE2__
> and ignore anything else.

Not a big fan of special cases that are arch dependent.

> (2) Put in just enough infrastructure to know if compiler support for general
> vectors is available, and then use it ad hoc when such functions are shown to
> be high on the profile?
>
> (3) Put in more infrastructure and allow it to be used to implement most guest
> vector operations, possibly tidying their implementations?
<snip>

(4) Consider supporting generic vector operations in the TCG?

While making helper functions faster is good I've wondered if they is
enough genericsm across the various SIMD/vector operations we could add
add TCG ops to translate them? The ops could fall back to generic helper
functions using the GCC instrinsics if we know there is no decent
back-end support for them?


-- 
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]