qemu-riscv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 4/4] RISC-V: add vector extension configure instruction


From: Richard Henderson
Subject: Re: [PATCH v3 4/4] RISC-V: add vector extension configure instruction
Date: Sat, 4 Jan 2020 10:41:04 +1100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2

On 1/3/20 2:33 PM, LIU Zhiwei wrote:
> vsetvl and vsetvli are two configure instructions for vl, vtype. TB flags
> should update after configure instructions. The (ill, lmul, sew ) of vtype
> and the bit of (VSTART == 0 && VL == VLMAX) will be placed within tb_flags.
> 
> Signed-off-by: LIU Zhiwei <address@hidden>
> ---
>  target/riscv/Makefile.objs              |  2 +-
>  target/riscv/cpu.c                      |  1 +
>  target/riscv/cpu.h                      | 55 ++++++++++++++++++++-----
>  target/riscv/helper.h                   |  2 +
>  target/riscv/insn32.decode              |  5 +++
>  target/riscv/insn_trans/trans_rvv.inc.c | 52 +++++++++++++++++++++++
>  target/riscv/translate.c                | 17 +++++++-
>  target/riscv/vector_helper.c            | 51 +++++++++++++++++++++++
>  8 files changed, 172 insertions(+), 13 deletions(-)
>  create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
>  create mode 100644 target/riscv/vector_helper.c
> 
> diff --git a/target/riscv/Makefile.objs b/target/riscv/Makefile.objs
> index b1c79bc1d1..d577cef9e0 100644
> --- a/target/riscv/Makefile.objs
> +++ b/target/riscv/Makefile.objs
> @@ -1,4 +1,4 @@
> -obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o 
> gdbstub.o pmp.o
> +obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o 
> vector_helper.o gdbstub.o pmp.o
>  
>  DECODETREE = $(SRC_PATH)/scripts/decodetree.py
>  
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index c2370a0a57..3ff7b50bff 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -347,6 +347,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
> **errp)
>          }
>      }
>      if (cpu->cfg.vext_spec) {
> +        env->vext.vtype = ~((target_ulong)-1 >> 1);

Better as FIELD_DP64(0, VTYPE, VILL, 1),


> +struct VTYPE {
> +#ifdef HOST_WORDS_BIGENDIAN
> +    target_ulong vill:1;
> +    target_ulong reserved:sizeof(target_ulong) * 8 - 7;
> +    target_ulong sew:3;
> +    target_ulong lmul:2;
> +#else
> +    target_ulong lmul:2;
> +    target_ulong sew:3;
> +    target_ulong reserved:sizeof(target_ulong) * 8 - 7;
> +    target_ulong vill:1;
> +#endif
> +};

Do not use bit fields to describe target register layout.
Use FIELD().

> -#define TB_FLAGS_MMU_MASK   3
> -#define TB_FLAGS_MSTATUS_FS MSTATUS_FS
> +typedef CPURISCVState CPUArchState;
> +typedef RISCVCPU ArchCPU;
> +#include "exec/cpu-all.h"
> +
> +FIELD(TB_FLAGS, MMU, 0, 2)
> +FIELD(TB_FLAGS, FS, 13, 2)

The change to use FIELD for MMU and FS should be made separately from adding
the vector state.

> +FIELD(TB_FLAGS, VL_EQ_VLMAX, 16, 1)
> +FIELD(TB_FLAGS, LMUL, 17, 2)
> +FIELD(TB_FLAGS, SEW, 19, 3)
> +FIELD(TB_FLAGS, VILL, 22, 1)

Why are you leaving holes in TB_FLAGS?  I know why the original hole was there,
since it corresponded to simple masks on other registers.

> +    vlmax = (1 << vtype->lmul) * cpu->cfg.vlen / (8 * (1 << vtype->sew));

Wow, this can be simplified a lot.

   (1 << LMUL) * VLEN / (8 * (1 << SEW))
 = (VLEN << LMUL) / (8 << SEW)
 = (VLEN << LMUL) >> (SEW + 3)
 = VLEN >> (SEW + 3 - LMUL)


> +    vl_eq_vlmax = (env->vext.vstart == 0) && (vlmax == env->vext.vl);
> +
> +    flags = FIELD_DP32(flags, TB_FLAGS, VILL, vtype->vill);
> +    flags = FIELD_DP32(flags, TB_FLAGS, SEW, vtype->sew);
> +    flags = FIELD_DP32(flags, TB_FLAGS, LMUL, vtype->lmul);
> +    flags = FIELD_DP32(flags, TB_FLAGS, VL_EQ_VLMAX, vl_eq_vlmax);

I wonder if perhaps this all ought to be nested under

  if (env->misa & RVV) {
      ...
  } else {
      flag = FIELD_DP32(flags, TB_FLAGS, VILL, 1);
  }

so that, for the normal case when RVV is disabled, we don't bother computing
all of those bits.

> +static bool trans_vsetvl(DisasContext *ctx, arg_vsetvl * a)
> +{
> +    TCGv s1, s2, d;
> +    d = tcg_temp_new();
> +    s1 = tcg_temp_new();
> +    s2 = tcg_temp_new();
> +    gen_get_gpr(s1, a->rs1);
> +    gen_get_gpr(s2, a->rs2);
> +    gen_helper_vector_vsetvli(d, cpu_env, s1, s2);
> +    tcg_gen_st_tl(d, cpu_env, offsetof(CPURISCVState, vext.vl));

Why are you performing the store to vl inline, as opposed to within the helper
funtion?

> +    exit_tb(ctx);

A normal exit is correct for vsetvl, because the new state is variable.

> +static bool trans_vsetvli(DisasContext *ctx, arg_vsetvli * a)
> +{
> +    TCGv s1, s2, d;
> +    d = tcg_temp_new();
> +    s1 = tcg_temp_new();
> +    s2 = tcg_const_tl(a->zimm);
> +    gen_get_gpr(s1, a->rs1);
> +    gen_helper_vector_vsetvli(d, cpu_env, s1, s2);
> +    tcg_gen_st_tl(d, cpu_env, offsetof(CPURISCVState, vext.vl));
> +    exit_tb(ctx);

You could use

  gen_goto_tb(ctx, 0, ctx->base.pc_next)

here, because the new state is unknown but constant.  It will be the same every
time the instruction is executed, and thus can compute the new state only once,
saving that computation in the link to the next tb.

> +target_ulong VECTOR_HELPER(vsetvli)(CPURISCVState *env, target_ulong s1,
> +    target_ulong s2)
> +{
> +    int vlmax, vl;
> +    RISCVCPU *cpu = env_archcpu(env);
> +    struct VTYPE *vtype = (struct VTYPE *)&s2;

FIELD_EX64 for all uses of VTYPE.

> +
> +    if (vtype->sew > cpu->cfg.elen) { /* only set vill bit. */
> +        env->vext.vtype = ~((target_ulong)-1 >> 1);

FIELD_DP64.

> +    vlmax = (1 << vtype->lmul) * cpu->cfg.vlen / (8 * (1 << vtype->sew));

Same simplification as before.  Perhaps extract this to an inline function for
clarity, documenting the algebraic simplification only once.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]