qemu-riscv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC v3 26/71] target/riscv: rvv-1.0: update vext_max_elems() for lo


From: Frank Chang
Subject: Re: [RFC v3 26/71] target/riscv: rvv-1.0: update vext_max_elems() for load/store insns
Date: Fri, 14 Aug 2020 10:48:54 +0800

On Fri, Aug 7, 2020 at 8:04 AM Richard Henderson <richard.henderson@linaro.org> wrote:
On 8/6/20 3:46 AM, frank.chang@sifive.com wrote:
> +static inline uint32_t vext_max_elems(uint32_t desc, uint32_t esz, bool is_ldst)
>  {
> -    return simd_maxsz(desc) << vext_lmul(desc);
> +    /*
> +     * As simd_desc support at most 256 bytes, the max vlen is 256 bits.
> +     * so vlen in bytes (vlenb) is encoded as maxsz.
> +     */
> +    uint32_t vlenb = simd_maxsz(desc);
> +
> +    if (is_ldst) {
> +        /*
> +         * Vector load/store instructions have the EEW encoded
> +         * directly in the instructions. The maximum vector size is
> +         * calculated with EMUL rather than LMUL.
> +         */
> +        uint32_t eew = ctzl(esz);
> +        uint32_t sew = vext_sew(desc);
> +        uint32_t lmul = vext_lmul(desc);
> +        int32_t emul = eew - sew + lmul;
> +        uint32_t emul_r = emul < 0 ? 0 : emul;
> +        return 1 << (ctzl(vlenb) + emul_r - ctzl(esz));

As I said before, the is_ldst instructions should put the EEW and EMUL values
into the SEW and LMUL desc fields, so that this does not need to be 
special-cased at all.

I add a vext_get_emul() helper function in trans_rvv.inc.c:

> static uint8_t vext_get_emul(DisasContext *s, uint8_t eew)
> {
>     int8_t lmul = sextract32(s->lmul, 0, 3);
>     int8_t emul = ctzl(eew) - (s->sew + 3) + lmul;  // may remove ctzl() if eew is already log2(eew)
>     return emul < 0 ? 0 : emul;
> }

and pass emul as LMUL field in VDATA so that it can be
reused in vector_helper.c: vext_max_elems():

> uint8_t emul = vext_get_emul(s, eew);
> data = "" VDATA, LMUL, emul);

I also remove the passing SEW field in VDATA codes as I think SEW
might not be required in the updated vext_max_elems() (see below).
 

> +        /* Return VLMAX */
> +        return 1 << (ctzl(vlenb) + vext_lmul(desc) - ctzl(esz));

This is overly complicated.

(1) 1 << ctzl(vlenb) == vlenb.
(2) I'm not sure why esz is not already a log2 number.

esz is passed from e.g. GEN_VEXT_LD_STRIDE() macro:

> #define GEN_VEXT_LD_STRIDE(NAME, ETYPE, LOAD_FN)        \
> void HELPER(NAME)(void *vd, void * v0, target_ulong base,           \
>                   target_ulong stride, CPURISCVState *env,                      \
>                   uint32_t desc)                                                                   \
> {                                                                                                           \
>     uint32_t vm = vext_vm(desc);                                                          \
>     vext_ldst_stride(vd, v0, base, stride, env, desc, vm, LOAD_FN,     \
>                      sizeof(ETYPE), GETPC(), MMU_DATA_LOAD);            \
> }
>
> GEN_VEXT_LD_STRIDE(vlse8_v,  int8_t,  lde_b)

which is calculated by sizeof(ETYPE), so the results would be: 1, 2, 4, 8.
and vext_max_elems() is called by e.g. vext_ldst_stride():

> uint32_t max_elems = vext_max_elems(desc, esz);

I can add another parameter to the macro and pass the hard-coded log2(esz) number
if it's the better way instead of using ctzl().
Or if there's another approach to get the log2(esz) number more elegantly?
 

This ought to look more like

  int scale = lmul - esz;
  return (scale < 0
          ? vlenb >> -scale
          : vlenb << scale);

 
Thanks for the detailed point outs.
I manage to change the codes to below as your suggestion.

> static inline uint32_t vext_max_elems(uint32_t desc, uint32_t esz)
> {
>     /*
>      * As simd_desc support at most 256 bytes, the max vlen is 256 bits.
>      * so vlen in bytes (vlenb) is encoded as maxsz.
>      */
>     uint32_t vlenb = simd_maxsz(desc);
>
>     /* Return VLMAX */
>     int scale = vext_lmul(desc) - ctzl(esz);  // may remove ctzl() if esz is already log2(esz)
>     return scale < 0 ? vlenb >> -scale : vlenb << scale;
> }
 

r~

Thanks for the review.
Frank Chang

reply via email to

[Prev in Thread] Current Thread [Next in Thread]