Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw

From:	gaosong
Subject:	Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw
Date:	Fri, 24 Feb 2023 15:24:00 +0800
User-agent:	Mozilla/5.0 (X11; Linux loongarch64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0


在 2023/2/23 下午11:22, Richard Henderson 写道:

On 2/22/23 22:23, gaosong wrote:
Hi, Richard

在 2023/2/21 上午1:21, Richard Henderson 写道:
On 2/19/23 21:47, gaosong wrote:
I have some questions:
1 Should we need implement  GVecGen*  for simple gvec instructiosn?
     such as add, sub , or , xor..
No, these are done generically.
2 Should we need implement all fni8/fni4, fniv,  fno?
You need not implement them all. Generally you will only implementfni4 for 32-bit arithmetic operations, and only fni8 for logicaloperations; there is rarely a cause for both with the same operation.
You can rely on the generic cutoff of 4 integer inline operations --easy for your maximum vector length of 128-bits -- to avoidimplementing fno.
But in extreme, you can implement only fno. You can choose thisover directly calling a helper function, minimizing differences inthe translator code paths and letting generic code build all of thepointers.
Sorry for the late reply,  and Thanks for you answers.

But I still need more help.

How gvec singed or unsigned extensions of vector elements?
There are no generic sign-extending; that turns out to be widelyvariable across the different hosts and guest architectures.
If your architecture widens the even elements, you can implementextensions as a pair of shifts in the wider element size. E.g.sign-extend is shl + sar.
I found no gvec function that implements signed and unsignedextensions of vector elements.However, the result of some instructions requires the elements to besigned or unsigned extensions.
You may need to implement these operations with fni[48] or out of linein a helper.It's hard to give advice without a specific example.

I was wrong, the instruction is to sign-extend the odd or even elementsof the vector before the operation, not to sign-extend the result.

E.g
vaddwev_h_b  vd, vj, vk
vd->H[i] = SignExtend(vj->B[2i])  + SignExtend(vk->B[2i]);
vaddwev_w_h  vd, vj, vk
vd->W[i] = SignExtend(vj->H[2i])  + SignExtend(vk->H[2i]);
vaddwev_d_w  vd, vj, vk
vd->Q[i] = SignExtend(vj->W[2i])  + SignExtend(vk->W[2i]);
vaddwev_q_d  vd, vj, vk
vd->Q[i] = SignExtend(vj->D[2i])  + SignExtend(vk->D[2i]);


Use  shl + sar  to sign-extend  vj/vk  even element.

static bool gvec_vvv(DisasContext *ctx, arg_vvv *a, MemOp mop,
                     void (*func)(unsigned, uint32_t, uint32_t,
                                  uint32_t, uint32_t, uint32_t))
{
    uint32_t vd_ofs, vj_ofs, vk_ofs;

    CHECK_SXE;

    vd_ofs = vreg_full_offset(a->vd);
    vj_ofs = vreg_full_offset(a->vj);
    vk_ofs = vreg_full_offset(a->vk);

    func(mop, vd_ofs, vj_ofs, vk_ofs, 16, 16);
    return true;
}
static void gen_vaddwev_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
{
    TCGv_vec t1 = tcg_temp_new_vec_matching(a);
    TCGv_vec t2 = tcg_temp_new_vec_matching(b);

    int halfbits  =  4 << vece;

    /* Sign-extend even elements from a */
    tcg_gen_dupi_vec(vece, t1, MAKE_64BIT_MASK(0, halfbits));
    tcg_gen_and_vec(vece, a, a, t1);
    tcg_gen_shli_vec(vece, a, a, halfbits);
    tcg_gen_sari_vec(vece, a, a, halfbits);

    /* Sign-extend even elements from b */
    tcg_gen_dupi_vec(vece, t2, MAKE_64BIT_MASK(0, halfbits));
    tcg_gen_and_vec(vece, b, b, t2);
    tcg_gen_shli_vec(vece, b, b, halfbits);
    tcg_gen_sari_vec(vece,  b, b, halfbits);

    tcg_gen_add_vec(vece, t, a, b);

    tcg_temp_free_vec(t1);
    tcg_temp_free_vec(t2);
}

static void gvec_vaddwev_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
                           uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
{
    static const TCGOpcode vecop_list[] = {

INDEX_op_shli_vec, INDEX_op_shri_vec, INDEX_op_add_vec,INDEX_op_sari_vec, 0

        };
    static const GVecGen3 op[4] = {
        {
            .fniv = gen_vaddwev_s,
            .fno = gen_helper_vaddwev_h_b,
            .opt_opc = vecop_list,
            .vece = MO_16
        },
        {
            .fniv = gen_vaddwev_s,
            .fno = gen_helper_vaddwev_w_h,
            .opt_opc = vecop_list,
            .vece = MO_32
        },
        {
            .fniv = gen_vaddwev_s,
            .fno = gen_helper_vaddwev_d_w,
            .opt_opc = vecop_list,
            .vece = MO_64
        },
        {
            .fniv = gen_vaddwev_s,
            .fno = gen_helper_vaddwev_q_d,
            .opt_opc = vecop_list,
            .vece = MO_128
        },
    };

    tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
}

TRANS(vaddwev_h_b, gvec_vvv, MO_8,  gvec_vaddwev_s)
TRANS(vaddwev_w_h, gvec_vvv, MO_16, gvec_vaddwev_s)
TRANS(vaddwev_d_w, gvec_vvv, MO_32, gvec_vaddwev_s)
TRANS(vaddwev_q_d, gvec_vvv, MO_64, gvec_vaddwev_s)

and I also implement  gen_helper_vaddwev_x_x.     Is this example correct?

Thanks.
Song Gao

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, gaosong, 2023/02/20
- Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, Richard Henderson, 2023/02/20
  - Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, gaosong, 2023/02/23
    - Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, Richard Henderson, 2023/02/23
    - Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, gaosong <=
    - Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, Richard Henderson, 2023/02/24
    - Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, gaosong, 2023/02/27
    - Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, Richard Henderson, 2023/02/27
    - Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, gaosong, 2023/02/27
    - Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, Richard Henderson, 2023/02/27
    - Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, gaosong, 2023/02/27
    - Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, Richard Henderson, 2023/02/27
    - Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, gaosong, 2023/02/27
    - Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, Richard Henderson, 2023/02/28
    - Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw, Richard Henderson, 2023/02/24

Prev by Date: Re: [PATCH v2 0/4] NUMA: Apply socket-NUMA-node boundary for aarch64 and RiscV machines
Next by Date: [PATCH] Deprecate the "-no-acpi" command line switch
Previous by thread: Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw
Next by thread: Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw
Index(es):
- Date
- Thread