qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] AVX support for TCG


From: Richard Henderson
Subject: Re: [Qemu-devel] AVX support for TCG
Date: Mon, 31 Dec 2018 12:58:43 +1100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1

On 12/31/18 7:51 AM, Nick Renieris wrote:
> The PS4's APU doesn't support AVX2 or AVX-512 so I'd be fine if I
> didn't have enough time to implement them.

Fair enough.  A goal like this is a good thing.

>> The tcg-op-gvec.h infrastructure allows for the different modes that avx+mmx
>> allows:
>>
>> (1) 64-bit operations,
>> (2) 128-bit operations, modifying only the low 128 bits,
>> (3) 128-bit operations, zeroing bits beyond the first 128,
>> (4) N*128-bit operations, zeroing bits beyond the first N*128.
> 
> I assume you mean 256-bit ops on (2) and (3), and N*256 on (4)? Low
> 128 bits of a 128-bit number is just the number.

No, I mean

 0FFCC8         paddb   %mm0, %mm1              (1)
 660FFCC8       paddb   %xmm0, %xmm1            (2)
 C5F1FCC8       vpaddb  %xmm0, %xmm1, %xmm1     (3)
 C5F5FCC8       vpaddb  %ymm0, %ymm1, %ymm1     (4)
 62F17548FCC8   vpaddb  %zmm0, %zmm1, %zmm1     (4)

On a system that supports AVX, (2) and (3), while computing 128-bit inputs and
producing a 128-bit output, have different effects on the rest of the 256-bit
register.


> So, I would need to implement every SSE instruction that isn't
> SSE_SPECIAL at the moment, using tcg-op-gvec.h? Or more instructions
> than that?

You'd want to do all of the SSE instructions, SSE_SPECIAL and otherwise.

I believe that we want to eliminate sse_op_table* and implement all insns
within a switch statement, like SSE_SPECIAL.  Note that this does not mean one
gigantic 5000 line function; appropriate use of helper functions should make
the code for each switch entry fairly small.

You'd want to re-organize the code generated by ops_sse.h using the (ptr, ptr,
..., desc) signature of gen_helper_gvec_{2,2i,3,...} and expand them using
tcg_gen_gvec_{2,2i,3,...}_ool.

Examples of these are in accel/tcg/tcg-runtime-gvec.c and
target/arm/vec_helper.c.  Use simd_oprsz to find out how much data should be
operated upon.  The clear_high function should be moved somewhere that it can
be shared.

Once all of this has been done for SSE, then AVX is implemented simply
adjusting the oprsz and maxsz arguments to tcg_gen_gvec_*.

> Assuming I do this for SSE and AVX, I would not need to touch anything
> else like the TCG back-end, as every gvec/vec op is already
> implemented for i386, correct?

Correct.


r~




reply via email to

[Prev in Thread] Current Thread [Next in Thread]