qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-arm] [Qemu-devel] [RFC PATCH 0/9] TCG Vector types and example


From: Richard Henderson
Subject: Re: [Qemu-arm] [Qemu-devel] [RFC PATCH 0/9] TCG Vector types and example conversion
Date: Fri, 18 Aug 2017 06:44:44 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1

On 08/18/2017 04:33 AM, Kirill Batuzov wrote:
> From my own experimentations some times ago,
> 
> (1) translating vector instructions to vector instructions in TCG is faster 
> than
> 
> (2) translating vector instructions to series of scalar instructions in TCG,
> which is faster than*
> 
> (3) translating vector instructions to single helper calls, which is faster
> than*
> 
> (4) translating vector instructions to helper calls for each vector element.
> 
> (*) (2) and (3) may change their respective places in case of some
> complicated instructions.

This was my gut feeling as well.  With the caveat that for the ARM SVE case of
2048-bit registers we cannot afford to expand inline due to generated code size.

> ARM (at least ARM32, I have not checked aarch64 in this regard) uses the
> last, the slowest scheme. As far as I understand, you are want to change
> it to the third approach. This approach is used in SSE emulation, may be
> you can use similar structure of helpers?
> 
> I still hope to finish my own series about implementation of the first
> approach. I apologize for the long delay since last update and hope to
> send next version somewhere next week. I do not think our series
> contradict each other: you are trying to optimize existing general
> purpose case while I'm trying to optimize case where both host and guest
> support vector instructions. Since I'm experimenting on ARM32, we'll not
> have much merge conflicts either.

I posted my own, different, take on vectorization yesterday as well.

  http://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg03272.html

The primary difference between my version and your version is that I do not
allow target/cpu/translate*.c to create vector types.  All of the host vector
expansion is done within tcg/*.c.

We also would like to settle on a common style for out-of-line helpers, if that
is possible.  One thing *not* to take from our current SSE emulation is that we
do not yet support AVX, AVX2, or AVX512 extensions.  So the current
construction of helpers within target/i386/ doesn't really take into account
all that should be required.

The thing that's common between AVX512 and SVE is that we have multiple vector
lengths, and that elements beyond the operation length are zeroed.  Both Alex
and I have packed operation length + full vector length into a descriptor given
to the helper.  (Alex allows for some other bits too; I'm not sure about that.)


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]