[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] AVX support for TCG
From: |
Richard Henderson |
Subject: |
Re: [Qemu-devel] AVX support for TCG |
Date: |
Mon, 31 Dec 2018 12:58:43 +1100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1 |
On 12/31/18 7:51 AM, Nick Renieris wrote:
> The PS4's APU doesn't support AVX2 or AVX-512 so I'd be fine if I
> didn't have enough time to implement them.
Fair enough. A goal like this is a good thing.
>> The tcg-op-gvec.h infrastructure allows for the different modes that avx+mmx
>> allows:
>>
>> (1) 64-bit operations,
>> (2) 128-bit operations, modifying only the low 128 bits,
>> (3) 128-bit operations, zeroing bits beyond the first 128,
>> (4) N*128-bit operations, zeroing bits beyond the first N*128.
>
> I assume you mean 256-bit ops on (2) and (3), and N*256 on (4)? Low
> 128 bits of a 128-bit number is just the number.
No, I mean
0FFCC8 paddb %mm0, %mm1 (1)
660FFCC8 paddb %xmm0, %xmm1 (2)
C5F1FCC8 vpaddb %xmm0, %xmm1, %xmm1 (3)
C5F5FCC8 vpaddb %ymm0, %ymm1, %ymm1 (4)
62F17548FCC8 vpaddb %zmm0, %zmm1, %zmm1 (4)
On a system that supports AVX, (2) and (3), while computing 128-bit inputs and
producing a 128-bit output, have different effects on the rest of the 256-bit
register.
> So, I would need to implement every SSE instruction that isn't
> SSE_SPECIAL at the moment, using tcg-op-gvec.h? Or more instructions
> than that?
You'd want to do all of the SSE instructions, SSE_SPECIAL and otherwise.
I believe that we want to eliminate sse_op_table* and implement all insns
within a switch statement, like SSE_SPECIAL. Note that this does not mean one
gigantic 5000 line function; appropriate use of helper functions should make
the code for each switch entry fairly small.
You'd want to re-organize the code generated by ops_sse.h using the (ptr, ptr,
..., desc) signature of gen_helper_gvec_{2,2i,3,...} and expand them using
tcg_gen_gvec_{2,2i,3,...}_ool.
Examples of these are in accel/tcg/tcg-runtime-gvec.c and
target/arm/vec_helper.c. Use simd_oprsz to find out how much data should be
operated upon. The clear_high function should be moved somewhere that it can
be shared.
Once all of this has been done for SSE, then AVX is implemented simply
adjusting the oprsz and maxsz arguments to tcg_gen_gvec_*.
> Assuming I do this for SSE and AVX, I would not need to touch anything
> else like the TCG back-end, as every gvec/vec op is already
> implemented for i386, correct?
Correct.
r~
- Re: [Qemu-devel] AVX support for TCG, (continued)
- Re: [Qemu-devel] AVX support for TCG, Nick Renieris, 2018/12/25
- Re: [Qemu-devel] AVX support for TCG, Richard Henderson, 2018/12/25
- Re: [Qemu-devel] AVX support for TCG, Nick Renieris, 2018/12/28
- Re: [Qemu-devel] AVX support for TCG, Peter Maydell, 2018/12/28
- Re: [Qemu-devel] AVX support for TCG, Nick Renieris, 2018/12/28
- Re: [Qemu-devel] AVX support for TCG, Peter Maydell, 2018/12/28
- Re: [Qemu-devel] AVX support for TCG, Nick Renieris, 2018/12/28
- Re: [Qemu-devel] AVX support for TCG, Alex Bennée, 2018/12/28
- Re: [Qemu-devel] AVX support for TCG, Richard Henderson, 2018/12/29
- Re: [Qemu-devel] AVX support for TCG, Nick Renieris, 2018/12/30
- Re: [Qemu-devel] AVX support for TCG,
Richard Henderson <=