[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions t
From: |
Richard Henderson |
Subject: |
Re: [Qemu-ppc] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions to use TCG vector operations |
Date: |
Mon, 17 Dec 2018 09:39:31 -0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1 |
On 12/17/18 4:23 AM, Mark Cave-Ayland wrote:
> NOTE: there are a lot of instructions that cannot (yet) be optimised to use
> TCG vector
> operations, however it struck me that there may be some potential for
> converting
> saturating add/sub and cmp instructions if there were a mechanism to return a
> set of
> flags indicating the result of the saturation/comparison.
There are also a lot of instructions that can be converted, but aren't:
* vspltis[bhw] can use tcg_gen_gvec_dup{8,16,32}i.
* vsplt{b,h,w} can use tcg_gen_gvec_dup_mem.
Note that you'll need something like vec_reg_offset from
target/arm/translate-a64.h to compute the offset of the
specific byte/word/long from which we are to splat.
* vmr should be handled by having tcg_gen_gvec_or notice aofs == bofs.
For ARM, we do special case this during translation.
But since tcg/tcg-op.c does these things for tcg_gen_or_i64,
we should probably handle the same set of transformations.
* vnot would need to be handled by actually adding a tcg_gen_gvec_nor
and then also noticing aofs == bofs.
For saturation, I think the easiest thing to do is represent SAT as a
ppc_avr_t. We notice saturation by also computing normal arithmetic and
comparing to see if they differ. E.g.
tcg_gen_gvec_add(vece, offsetof_avr_tmp,
offsetof(ra), offsetof(rb), 16, 16);
tcg_gen_gvec_ssadd(vece, offsetof(rt),
offsetof(ra), offsetof(rb), 16, 16);
tcg_gen_gvec_cmp(TCG_COND_NE, vece, offsetof_avr_tmp,
offsetof_avr_tmp, offsetof(rt), 16, 16);
tcg_gen_gvec_or(vece, offsetof_avr_sat, offsetof_avr_sat,
offsetof_avr_tmp, 16, 16);
You only need to convert the ppc_avr_t to a single bit when reading VSCR.
For comparisons... that's tricky. I wonder if there's anything better than
tcg_gen_gvec_cmp(TCG_COND_FOO, vece, offsetof(rt),
offsetof(ra), offsetof(rb), 16, 16);
if (rc) {
TCGv_i64 hi, lo, t, f;
tcg_gen_ld_i64(hi, cpu_env, offsetof(rt));
tcg_gen_ld_i64(lo, cpu_env, offsetof(rt) + 8);
tcg_gen_and_i64(t, hi, lo);
tcg_gen_or_i64(f, hi, lo);
tcg_gen_setcondi_i64(TCG_COND_EQ, t, t, -1);
tcg_gen_setcondi_i64(TCG_COND_EQ, f, f, 0);
// truncate to i32, shift, or, and set to cr6.
}
r~
- [Qemu-ppc] [RFC PATCH v2 5/9] target/ppc: switch FPR, VMX and VSX helpers to access data directly from cpu_env, (continued)
- [Qemu-ppc] [RFC PATCH v2 5/9] target/ppc: switch FPR, VMX and VSX helpers to access data directly from cpu_env, Mark Cave-Ayland, 2018/12/17
- [Qemu-ppc] [RFC PATCH v2 1/9] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access, Mark Cave-Ayland, 2018/12/17
- [Qemu-ppc] [RFC PATCH v2 3/9] target/ppc: introduce get_cpu_vsr{l, h}() and set_cpu_vsr{l, h}() helpers for VSR register access, Mark Cave-Ayland, 2018/12/17
- [Qemu-ppc] [RFC PATCH v2 9/9] target/ppc: convert vaddu[b, h, w, d] and vsubu[b, h, w, d] over to use vector operations, Mark Cave-Ayland, 2018/12/17
- [Qemu-ppc] [RFC PATCH v2 7/9] target/ppc: move FP and VMX registers into aligned vsr register array, Mark Cave-Ayland, 2018/12/17
- [Qemu-ppc] [RFC PATCH v2 8/9] target/ppc: convert VMX logical instructions to use vector operations, Mark Cave-Ayland, 2018/12/17
- Re: [Qemu-ppc] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions to use TCG vector operations,
Richard Henderson <=