[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw
From: |
gaosong |
Subject: |
Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw |
Date: |
Fri, 24 Feb 2023 15:24:00 +0800 |
User-agent: |
Mozilla/5.0 (X11; Linux loongarch64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 |
在 2023/2/23 下午11:22, Richard Henderson 写道:
On 2/22/23 22:23, gaosong wrote:
Hi, Richard
在 2023/2/21 上午1:21, Richard Henderson 写道:
On 2/19/23 21:47, gaosong wrote:
I have some questions:
1 Should we need implement GVecGen* for simple gvec instructiosn?
such as add, sub , or , xor..
No, these are done generically.
2 Should we need implement all fni8/fni4, fniv, fno?
You need not implement them all. Generally you will only implement
fni4 for 32-bit arithmetic operations, and only fni8 for logical
operations; there is rarely a cause for both with the same operation.
You can rely on the generic cutoff of 4 integer inline operations --
easy for your maximum vector length of 128-bits -- to avoid
implementing fno.
But in extreme, you can implement only fno. You can choose this
over directly calling a helper function, minimizing differences in
the translator code paths and letting generic code build all of the
pointers.
Sorry for the late reply, and Thanks for you answers.
But I still need more help.
How gvec singed or unsigned extensions of vector elements?
There are no generic sign-extending; that turns out to be widely
variable across the different hosts and guest architectures.
If your architecture widens the even elements, you can implement
extensions as a pair of shifts in the wider element size. E.g.
sign-extend is shl + sar.
I found no gvec function that implements signed and unsigned
extensions of vector elements.
However, the result of some instructions requires the elements to be
signed or unsigned extensions.
You may need to implement these operations with fni[48] or out of line
in a helper.
It's hard to give advice without a specific example.
I was wrong, the instruction is to sign-extend the odd or even elements
of the vector before the operation, not to sign-extend the result.
E.g
vaddwev_h_b vd, vj, vk
vd->H[i] = SignExtend(vj->B[2i]) + SignExtend(vk->B[2i]);
vaddwev_w_h vd, vj, vk
vd->W[i] = SignExtend(vj->H[2i]) + SignExtend(vk->H[2i]);
vaddwev_d_w vd, vj, vk
vd->Q[i] = SignExtend(vj->W[2i]) + SignExtend(vk->W[2i]);
vaddwev_q_d vd, vj, vk
vd->Q[i] = SignExtend(vj->D[2i]) + SignExtend(vk->D[2i]);
Use shl + sar to sign-extend vj/vk even element.
static bool gvec_vvv(DisasContext *ctx, arg_vvv *a, MemOp mop,
void (*func)(unsigned, uint32_t, uint32_t,
uint32_t, uint32_t, uint32_t))
{
uint32_t vd_ofs, vj_ofs, vk_ofs;
CHECK_SXE;
vd_ofs = vreg_full_offset(a->vd);
vj_ofs = vreg_full_offset(a->vj);
vk_ofs = vreg_full_offset(a->vk);
func(mop, vd_ofs, vj_ofs, vk_ofs, 16, 16);
return true;
}
static void gen_vaddwev_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
{
TCGv_vec t1 = tcg_temp_new_vec_matching(a);
TCGv_vec t2 = tcg_temp_new_vec_matching(b);
int halfbits = 4 << vece;
/* Sign-extend even elements from a */
tcg_gen_dupi_vec(vece, t1, MAKE_64BIT_MASK(0, halfbits));
tcg_gen_and_vec(vece, a, a, t1);
tcg_gen_shli_vec(vece, a, a, halfbits);
tcg_gen_sari_vec(vece, a, a, halfbits);
/* Sign-extend even elements from b */
tcg_gen_dupi_vec(vece, t2, MAKE_64BIT_MASK(0, halfbits));
tcg_gen_and_vec(vece, b, b, t2);
tcg_gen_shli_vec(vece, b, b, halfbits);
tcg_gen_sari_vec(vece, b, b, halfbits);
tcg_gen_add_vec(vece, t, a, b);
tcg_temp_free_vec(t1);
tcg_temp_free_vec(t2);
}
static void gvec_vaddwev_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
{
static const TCGOpcode vecop_list[] = {
INDEX_op_shli_vec, INDEX_op_shri_vec, INDEX_op_add_vec,
INDEX_op_sari_vec, 0
};
static const GVecGen3 op[4] = {
{
.fniv = gen_vaddwev_s,
.fno = gen_helper_vaddwev_h_b,
.opt_opc = vecop_list,
.vece = MO_16
},
{
.fniv = gen_vaddwev_s,
.fno = gen_helper_vaddwev_w_h,
.opt_opc = vecop_list,
.vece = MO_32
},
{
.fniv = gen_vaddwev_s,
.fno = gen_helper_vaddwev_d_w,
.opt_opc = vecop_list,
.vece = MO_64
},
{
.fniv = gen_vaddwev_s,
.fno = gen_helper_vaddwev_q_d,
.opt_opc = vecop_list,
.vece = MO_128
},
};
tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
}
TRANS(vaddwev_h_b, gvec_vvv, MO_8, gvec_vaddwev_s)
TRANS(vaddwev_w_h, gvec_vvv, MO_16, gvec_vaddwev_s)
TRANS(vaddwev_d_w, gvec_vvv, MO_32, gvec_vaddwev_s)
TRANS(vaddwev_q_d, gvec_vvv, MO_64, gvec_vaddwev_s)
and I also implement gen_helper_vaddwev_x_x. Is this example correct?
Thanks.
Song Gao
- Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, gaosong, 2023/02/20
- Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, Richard Henderson, 2023/02/20
- Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, gaosong, 2023/02/23
- Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, Richard Henderson, 2023/02/23
- Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw,
gaosong <=
- Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, Richard Henderson, 2023/02/24
- Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, gaosong, 2023/02/27
- Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, Richard Henderson, 2023/02/27
- Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, gaosong, 2023/02/27
- Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, Richard Henderson, 2023/02/27
- Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, gaosong, 2023/02/27
- Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, Richard Henderson, 2023/02/27
- Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, gaosong, 2023/02/27
- Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, Richard Henderson, 2023/02/28
- Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, Richard Henderson, 2023/02/24