qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw


From: gaosong
Subject: Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw
Date: Fri, 24 Feb 2023 15:24:00 +0800
User-agent: Mozilla/5.0 (X11; Linux loongarch64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0


在 2023/2/23 下午11:22, Richard Henderson 写道:
On 2/22/23 22:23, gaosong wrote:
Hi, Richard

在 2023/2/21 上午1:21, Richard Henderson 写道:
On 2/19/23 21:47, gaosong wrote:
I have some questions:
1 Should we need implement  GVecGen*  for simple gvec instructiosn?
     such as add, sub , or , xor..

No, these are done generically.

2 Should we need implement all fni8/fni4, fniv,  fno?

You need not implement them all.  Generally you will only implement fni4 for 32-bit arithmetic operations, and only fni8 for logical operations; there is rarely a cause for both with the same operation.

You can rely on the generic cutoff of 4 integer inline operations -- easy for your maximum vector length of 128-bits -- to avoid implementing fno.

But in extreme, you can implement only fno.  You can choose this over directly calling a helper function, minimizing differences in the translator code paths and letting generic code build all of the pointers.

Sorry for the late reply,  and Thanks for you answers.

But I still need more help.

How gvec singed or unsigned extensions of vector elements?

There are no generic sign-extending; that turns out to be widely variable across the different hosts and guest architectures.

If your architecture widens the even elements, you can implement extensions as a pair of shifts in the wider element size.  E.g. sign-extend is shl + sar.

I found no gvec function that implements signed and unsigned extensions of vector elements. However, the result of some instructions requires the elements to be signed or unsigned extensions.

You may need to implement these operations with fni[48] or out of line in a helper. It's hard to give advice without a specific example.
I was wrong, the instruction is to sign-extend the odd or even elements of the vector before the operation, not to sign-extend the result.
E.g
vaddwev_h_b  vd, vj, vk
vd->H[i] = SignExtend(vj->B[2i])  + SignExtend(vk->B[2i]);
vaddwev_w_h  vd, vj, vk
vd->W[i] = SignExtend(vj->H[2i])  + SignExtend(vk->H[2i]);
vaddwev_d_w  vd, vj, vk
vd->Q[i] = SignExtend(vj->W[2i])  + SignExtend(vk->W[2i]);
vaddwev_q_d  vd, vj, vk
vd->Q[i] = SignExtend(vj->D[2i])  + SignExtend(vk->D[2i]);


Use  shl + sar  to sign-extend  vj/vk  even element.

static bool gvec_vvv(DisasContext *ctx, arg_vvv *a, MemOp mop,
                     void (*func)(unsigned, uint32_t, uint32_t,
                                  uint32_t, uint32_t, uint32_t))
{
    uint32_t vd_ofs, vj_ofs, vk_ofs;

    CHECK_SXE;

    vd_ofs = vreg_full_offset(a->vd);
    vj_ofs = vreg_full_offset(a->vj);
    vk_ofs = vreg_full_offset(a->vk);

    func(mop, vd_ofs, vj_ofs, vk_ofs, 16, 16);
    return true;
}
static void gen_vaddwev_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
{
    TCGv_vec t1 = tcg_temp_new_vec_matching(a);
    TCGv_vec t2 = tcg_temp_new_vec_matching(b);

    int halfbits  =  4 << vece;

    /* Sign-extend even elements from a */
    tcg_gen_dupi_vec(vece, t1, MAKE_64BIT_MASK(0, halfbits));
    tcg_gen_and_vec(vece, a, a, t1);
    tcg_gen_shli_vec(vece, a, a, halfbits);
    tcg_gen_sari_vec(vece, a, a, halfbits);

    /* Sign-extend even elements from b */
    tcg_gen_dupi_vec(vece, t2, MAKE_64BIT_MASK(0, halfbits));
    tcg_gen_and_vec(vece, b, b, t2);
    tcg_gen_shli_vec(vece, b, b, halfbits);
    tcg_gen_sari_vec(vece,  b, b, halfbits);

    tcg_gen_add_vec(vece, t, a, b);

    tcg_temp_free_vec(t1);
    tcg_temp_free_vec(t2);
}

static void gvec_vaddwev_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
                           uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
{
    static const TCGOpcode vecop_list[] = {
        INDEX_op_shli_vec, INDEX_op_shri_vec, INDEX_op_add_vec, INDEX_op_sari_vec, 0
        };
    static const GVecGen3 op[4] = {
        {
            .fniv = gen_vaddwev_s,
            .fno = gen_helper_vaddwev_h_b,
            .opt_opc = vecop_list,
            .vece = MO_16
        },
        {
            .fniv = gen_vaddwev_s,
            .fno = gen_helper_vaddwev_w_h,
            .opt_opc = vecop_list,
            .vece = MO_32
        },
        {
            .fniv = gen_vaddwev_s,
            .fno = gen_helper_vaddwev_d_w,
            .opt_opc = vecop_list,
            .vece = MO_64
        },
        {
            .fniv = gen_vaddwev_s,
            .fno = gen_helper_vaddwev_q_d,
            .opt_opc = vecop_list,
            .vece = MO_128
        },
    };

    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
}

TRANS(vaddwev_h_b, gvec_vvv, MO_8,  gvec_vaddwev_s)
TRANS(vaddwev_w_h, gvec_vvv, MO_16, gvec_vaddwev_s)
TRANS(vaddwev_d_w, gvec_vvv, MO_32, gvec_vaddwev_s)
TRANS(vaddwev_q_d, gvec_vvv, MO_64, gvec_vaddwev_s)

and I also implement  gen_helper_vaddwev_x_x.     Is this example correct?

Thanks.
Song Gao




reply via email to

[Prev in Thread] Current Thread [Next in Thread]