[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH RFC v3 00/11] Add RX archtecture support
From: |
Yoshinori Sato |
Subject: |
Re: [Qemu-devel] [PATCH RFC v3 00/11] Add RX archtecture support |
Date: |
Wed, 20 Mar 2019 23:05:05 +0900 |
User-agent: |
Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM/1.14.9 (Gojō) APEL/10.8 EasyPG/1.0.0 Emacs/25.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) |
On Fri, 08 Mar 2019 10:24:23 +0900,
Richard Henderson wrote:
>
> On 3/1/19 10:21 PM, Yoshinori Sato wrote:
> > My git repository is bellow.
> > git://git.pf.osdn.net/gitroot/y/ys/ysato/qemu.git
>
> Somehow patch 1 did not arrive, so I am reviewing based on
> rebasing this branch against master, and then looking at the diff.
>
> > +struct CCop;
>
> Unused?
Yes. I forgot remove.
Remove it.
> > +static inline void cpu_get_tb_cpu_state(CPURXState *env, target_ulong *pc,
> > + target_ulong *cs_base, uint32_t
> > *flags)
> > +{
> > + *pc = env->pc;
> > + *cs_base = 0;
> > + *flags = 0;
> > +}
>
> *flags should contain PSW[PM], I think, so that knowledge of the
> privilege level is given to the translator.
>
> Looking forward I see that you're currently testing cpu_psw_pm dynamically for
> instructions that require privs, so what you have is not wrong, but this is
> exactly the sort of thing that TB flags are for.
OK.
Update PSW.PM copy to DisasContextBase.flags.
> > +#define RX_PSW_OP_NEG 4
>
> Unused?
Yes. Remove it.
> > +#define RX_BYTE 0
> > +#define RX_WORD 1
> > +#define RX_LONG 2
>
> Redundant with TCGMemOps: MO_8, MO_16, MO_32?
OK. Convert it.
> > +++ b/target/rx/insns.decode
>
> Should have a copyright + license notice.
>
> > +BCnd_b 0010 cd:4 dsp:8 &bcnd
> ...
> > +#BRA_b 0010 1110 dsp:8 # overlaps BCnd_b
>
> FYI, using pattern groups this can be written
>
> {
> BRA_b 0010 1110 dsp:8
> Bcnd_b 0010 cd:4 dsp:8
> }
>
> I expect to have pattern groups merged this week.
OK. Update pattern groups.
> > +static void rx_tr_init_disas_context(DisasContextBase *dcbase, CPUState
> > *cs)
> > +{
> > + DisasContext *ctx = container_of(dcbase, DisasContext, base);
> > +
> > + ctx->src = tcg_temp_new();
> > +}
>
> This allocation will not survive across a branch or label.
> So, after any SHIFTR_REG, for instance.
>
> I think you need to allocate this on demand each insn, and
> free it as necessary after each insn.
>
> (Although avoiding branches in tcg is always helpful.)
OK. Remove it.
> > +/* generic load / store wrapper */
> > +static inline void rx_gen_ldst(unsigned int size, unsigned int dir,
> > + TCGv reg, TCGv mem)
> > +{
> > + if (dir) {
> > + tcg_gen_qemu_ld_i32(reg, mem, 0, size | MO_SIGN | MO_TE);
> > + } else {
> > + tcg_gen_qemu_st_i32(reg, mem, 0, size | MO_TE);
> > + }
> > +}
>
> It would probably be worthwhile to split this function, and drop the "dir"
> parameter. It is always a constant, so instead of
>
> rx_gen_ldst(mi, RX_MEMORY_LD, ctx->src, ctx->src);
> rx_gen_ldst(a->sz, RX_MEMORY_ST, cpu_regs[a->rs], ctx->src);
>
> use the shorter
>
> rx_gen_ld(mi, ctx->src, ctx->src);
> rx_gen_st(a->sz, cpu_regs[a->rs], ctx->src);
OK. fixed
>
> > +/* mov.l #uimm4,rd */
> > +/* mov.l #uimm8,rd */
> > +static bool trans_MOV_ri(DisasContext *ctx, arg_MOV_ri *a)
> > +{
> > + tcg_gen_movi_i32(cpu_regs[a->rd], a->imm & 0xff);
> > + return true;
> > +}
>
> a->imm will already have been masked by the decode.
> You can drop the & 0xff here.
That was right.
It seems that I was misunderstood.
> > +/* mov.l #imm,rd */
> > +static bool trans_MOV_rli(DisasContext *ctx, arg_MOV_rli *a)
> > +{
> > + tcg_gen_movi_i32(cpu_regs[a->rd], a->imm);
> > + return true;
> > +}
>
> As written, this function is redundant. We should be using the same MOV_ri,
> with the immediate loaded from %b2_li_2.
>
> Likewise for trans_MOV_mi vs trans_MOV_mli. That said...
OK. Unified various interger instructions.
> > +static bool trans_MOV_mli(DisasContext *ctx, arg_MOV_mli *a)
> > +{
> > + TCGv imm = tcg_const_i32(a->imm);
> > + if (a->ld == 2) {
> > + a->dsp = bswap_16(a->dsp);
> > + }
>
> This suggests that the decode is incorrect. Or perhaps the construction of
> the
> 32-bit insn passed into decode. In decode_load_bytes, we construct a
> big-endian value, so it would seem this dsp field should be loaded as a
> little-endian value.
>
> This could be fixed by not attempting to load the LI constant in decodetree
> (for this insn), which would in turn not require that you decode the LD
> operand
> by hand in decodetree. E.g.
>
> static bool trans_MOV_mli(DisasContext *ctx, arg_MOV_mli *a)
> {
> TCGv imm;
>
> /* dsp operand comes before immediate operand */
> rx_index_addr(a->ld, a->sz, a->rd, s);
> imm = tcg_const_i32(li(s, a->li));
> rx_gen_st(a->sz, imm, ctx->src);
> tcg_temp_free_i32(imm);
> return true;
> }
>
> This will be easiest if you ever support the big-endian version of RX.
OK. fixed another way.
But RX big-endian mode only data access.
So operand value always little-endian order.
> > +/* push rs */
> > +static bool trans_PUSH_r(DisasContext *ctx, arg_PUSH_r *a)
> > +{
> > + if (a->rs != 0) {
> > + tcg_gen_subi_i32(cpu_regs[0], cpu_regs[0], 4);
> > + rx_gen_ldst(a->sz, RX_MEMORY_ST, cpu_regs[a->rs], cpu_regs[0]);
> > + } else {
> > + tcg_gen_mov_i32(ctx->src, cpu_regs[a->rs]);
> > + tcg_gen_subi_i32(cpu_regs[0], cpu_regs[0], 4);
> > + rx_gen_ldst(a->sz, RX_MEMORY_ST, ctx->src, cpu_regs[0]);
> > + }
> > + return true;
>
> As far as I can see the THEN and ELSE cases have identical operation.
It little different.
In the case of r0, the value before decrementing is stored in memory.
I added comment.
> > +static bool trans_XCHG_rl(DisasContext *ctx, arg_XCHG_rl *a)
> > +{
> > + int sz;
> > + TCGv tmp;
> > + tmp = tcg_temp_new();
> > + if (a->ld == 3) {
> > + /* xchg rs,rd */
> > + tcg_gen_mov_i32(tmp, cpu_regs[a->rs]);
> > + tcg_gen_mov_i32(cpu_regs[a->rs], cpu_regs[a->rd]);
> > + tcg_gen_mov_i32(cpu_regs[a->rd], tmp);
> > + } else {
> > + switch (a->mi) {
> > + case 0 ... 2:
> > + rx_index_addr(a->ld, a->mi, a->rs, ctx);
> > + rx_gen_ldst(a->mi, RX_MEMORY_LD, tmp, ctx->src);
> > + sz = a->mi;
> > + break;
> > + case 3:
> > + rx_index_addr(a->ld, RX_MEMORY_WORD, a->rs, ctx);
> > + rx_gen_ldu(RX_MEMORY_WORD, tmp, ctx->src);
> > + sz = RX_MEMORY_WORD;
> > + break;
> > + case 4:
> > + rx_index_addr(a->ld, RX_MEMORY_BYTE, a->rs, ctx);
> > + rx_gen_ldu(RX_MEMORY_BYTE, tmp, ctx->src);
> > + sz = RX_MEMORY_BYTE;
> > + break;
> > + }
> > + rx_gen_ldst(sz, RX_MEMORY_ST, cpu_regs[a->rd], ctx->src);
> > + tcg_gen_mov_i32(cpu_regs[a->rd], tmp);
> > + }
>
> While I doubt that there is an SMP RX cpu, I think this should still implement
> an atomic xchg operation. This will allow rx-linux-user to run on SMP host
> cpus. Thus use
>
> static inline TCGMemOp mi_to_mop(unsigned mi)
> {
> static const TCGMemOp mop[5] = { MO_SB, MO_SW, MO_UL, MO_UH, MO_UB };
> tcg_debug_assert(mi < 5);
> return mop[mi];
> }
>
> tcg_gen_atomic_xchg_i32(cpu_regs[a->rd], ctx->src, cpu_regs[a->rd],
> 0, mi_to_mop(a->mi));
>
OK.
>
> > +#define STCOND(cond) \
> > + do { \
> > + tcg_gen_movi_i32(ctx->imm, a->imm); \
> > + tcg_gen_movi_i32(ctx->one, 1); \
> > + tcg_gen_movcond_i32(cond, cpu_regs[a->rd], cpu_psw_z, \
> > + ctx->one, ctx->imm, cpu_regs[a->rd]); \
> > + return true; \
> > + } while(0)
>
> Unused? This seems close to what you wanted instead of...
Yes. Removed.
> > +#define STZFN(maskop) \
> > + do { \
> > + TCGv mask, imm; \
> > + mask = tcg_temp_new(); \
> > + imm = tcg_temp_new(); \
> > + maskop; \
> > + tcg_gen_andi_i32(imm, mask, a->imm); \
> > + tcg_gen_andc_i32(cpu_regs[a->rd], cpu_regs[a->rd], mask); \
> > + tcg_gen_or_i32(cpu_regs[a->rd], cpu_regs[a->rd], imm); \
> > + tcg_temp_free(mask); \
> > + tcg_temp_free(imm); \
> > + return true; \
> > + } while(0)
>
> ... this.
>
> TCGv zero = tcg_const_i32(0);
> TCGv imm = tcg_const_i32(a->imm);
> tcg_gen_movcond(COND, cpu_regs[a->rd], cpu_psw_z, zero,
> imm, cpu_regs[a->rd]);
>
> where trans_STZ passes TCG_COND_NE and STNZ passes TCG_COND_EQ.
>
> In addition, there's no point in using a #define when a function would work
> just as well.
OK. fixed.
> > +#define GEN_LOGIC_OP(opr, ret, arg1, arg2) \
> > + do { \
> > + opr(ret, arg1, arg2); \
> > + tcg_gen_setcondi_i32(TCG_COND_EQ, cpu_psw_z, ret, 0); \
> > + tcg_gen_setcondi_i32(TCG_COND_GEU, cpu_psw_s, ret, 0x80000000UL); \
> > + } while(0)
>
> Note that the second is the same as
>
> tcg_gen_setcondi_i32(TCG_COND_LT, cpu_psw_s, ret, 0);
> or
> tcg_gen_shri_i32(cpu_psw_s, ret, 31);
>
> That said, this extra computation is exactly why ARM (and others) represent
> the
> flag differently in the field. The inline setting would be
>
> tcg_gen_mov_i32(cpu_psw_z, ret);
> tcg_gen_mov_i32(cpu_psw_s, ret);
>
> And then whenever we *use* these flags, we only look at the sign bit of S, and
> we see if the whole of Z is zero/non-zero. Similarly, the overflow flag is
> also stored in the sign bit of O, so that is computed more easily:
>
> O = (ret ^ in1) & ~(in1 & in2)
>
> with no right-shift required at the end.
>
> This also means that we've reduced the amount of inline computation to the
> point that there is no point in *not* doing the computation every time it is
> required, and so update_psw_o and cpu->psw_v[] goes away.
OK. fixed psw operation.
> Again, no reason for a macro when a function will do.
> They are easier to debug.
OK.
> > + rs = (a->rs2 < 16) ? a->rs2 : a->rd;
>
> Is this due to
>
> > @b2_rds_uimm4 .... .... imm:4 rd:4 &rri rs2=255 len=2
>
> this? A better decode is
>
> @b2_rds_uimm4 .... .... imm:4 rd:4 &rri rs2=%b2_r_0 len=2
>
> i.e. extract the rd field twice. Once directly into rd and again from the
> same
> bits into rs2.
OK. It works fine.
> > +/* revw rs, rd */
> > +static bool trans_REVW(DisasContext *ctx, arg_REVW *a)
> > +{
> > + TCGv hi, lo;
> > +
> > + hi = tcg_temp_new();
> > + lo = tcg_temp_new();
> > + tcg_gen_shri_i32(hi, cpu_regs[a->rs], 16);
> > + tcg_gen_bswap16_i32(hi, hi);
> > + tcg_gen_shli_i32(hi, hi, 16);
> > + tcg_gen_bswap16_i32(lo, cpu_regs[a->rs]);
> > + tcg_gen_or_i32(cpu_regs[a->rd], hi, lo);
> > + tcg_temp_free(hi);
> > + tcg_temp_free(lo);
> > + return true;
> > +}
>
> The bswap16 primitive requires the input to be zero-extended.
> Thus this second bswap16 may fail. Perhaps better as
>
> T0 = 0x00ff00ff;
> T1 = RS & T0;
> T2 = RS >> 8;
> T1 = T1 << 8;
> T2 = T2 & T0;
> RD = T1 | T2;
OK.
> > +/* conditional branch helper */
> > +static void rx_bcnd_main(DisasContext *ctx, int cd, int dst, int len)
> > +{
> > + TCGv t, f, cond;
> > + switch(cd) {
> > + case 0 ... 13:
>
> Indentation is off?
checkpatch.pl said "switch and case should be at the same indent".
> > +static bool trans_BCnd_s(DisasContext *ctx, arg_BCnd_s *a)
> > +{
> > + if (a->dsp < 3)
> > + a->dsp += 8;
>
> This bit should probably be in the decode, via !function.
OK.
> > +static int16_t rev16(uint16_t dsp)
> > +{
> > + return ((dsp << 8) & 0xff00) | ((dsp >> 8) & 0x00ff);
> > +}
> > +
> > +static int32_t rev24(uint32_t dsp)
> > +{
> > + dsp = ((dsp << 16) & 0xff0000) |
> > + (dsp & 0x00ff00) |
> > + ((dsp >> 16) & 0x0000ff);
> > + dsp |= (dsp & 0x00800000) ? 0xff000000 : 0x00000000;
> > + return dsp;
> > +}
>
> These could also be in the decode, probably with %some_field.
> But as with MOV above, maybe ought to be done via an explicit
> call to li(), in order to get a (potential) big-endian RX to work.
This field always little-endian. So it needed.
> Also, watch out for CODING_STYLE violations.
> Please recheck everything with ./scripts/checkpatch.pl.
OK.
> > +/* mvtachi rs */
> > +static bool trans_MVTACHI(DisasContext *ctx, arg_MVTACHI *a)
> > +{
> > + TCGv_i32 hi, lo;
> > + hi = tcg_temp_new_i32();
> > + lo = tcg_temp_new_i32();
> > + tcg_gen_extr_i64_i32(lo, hi, cpu_acc);
> > + tcg_gen_concat_i32_i64(cpu_acc, lo, cpu_regs[a->rs]);
> > + tcg_temp_free_i32(hi);
> > + tcg_temp_free_i32(lo);
> > + return true;
> > +}
>
> Hmm. How about
>
> TCGv_i64 rs64 = tcg_temp_new_i64();
> tcg_gen_extu_i32_i64(rs64, cpu_regs[a->rs]);
> tcg_gen_deposit_i64(cpu_acc, cpu_acc, rs64, 32, 32);
OK. Fixed it.
> > +static inline void clrsetpsw(int dst, int mode)
> > +{
> > + TCGv psw[] = {
> > + cpu_psw_c, cpu_psw_z, cpu_psw_s, cpu_psw_o,
> > + NULL, NULL, NULL, NULL,
> > + cpu_psw_i, cpu_psw_u, NULL, NULL,
> > + NULL, NULL, NULL, NULL
> > + };
> > + TCGLabel *skip;
> > +
> > + skip = gen_new_label();
> > + if (dst >= 8)
> > + tcg_gen_brcondi_i32(TCG_COND_NE, cpu_psw_pm, 0, skip);
> > + tcg_gen_movi_i32(psw[dst], mode);
> > + if (dst == 3)
> > + tcg_gen_movi_i32(cpu_pswop, RX_PSW_OP_NONE);
> > + gen_set_label(skip);
> > +}
>
> When clearing psw_i, you'll need to exit to the main loop so that any pending
> interrupts are recognized. Otherwise you risk going around in a loop and
> never
> delivering the interrupt at all.
OK.
> > +/* rtfi */
> > +static bool trans_RTFI(DisasContext *ctx, arg_RTFI *a)
> > +{
> > + check_previleged();
> > + tcg_gen_mov_i32(cpu_pc, cpu_bpc);
> > + tcg_gen_mov_i32(cpu_psw, cpu_bpsw);
> > + gen_helper_unpack_psw(cpu_env);
> > + ctx->base.is_jmp = DISAS_JUMP;
> > + return true;
> > +}
>
> Likewise, you need to exit to the main loop, because psw_i changed.
OK. All psw_i changed instruction exit to loop.
> > + cpu_pswop = tcg_global_mem_new_i32(cpu_env,
> > + offsetof(CPURXState, psw_op), "");
> > + for (i = 0; i < 3; i++) {
> > + cpu_pswop_v[i] = tcg_global_mem_new_i32(cpu_env,
> > + offsetof(CPURXState,
> > psw_v[i]), "");
>
> Using the empty string for the name of a temp is cruel.
> That'll be very hard to debug.
OK. Removed this variable.
> > +uint32_t psw_cond(CPURXState *env, uint32_t cond)
>
> In general this can be inlined fairly easily.
> See, for instance, target/arm/translate.c, arm_test_cc.
OK. It very helpful.
I reworked psw operation.
> > +uint32_t helper_div(CPURXState *env, uint32_t num, uint32_t den)
> > +{
> > + uint32_t ret = num;
> > + if (den != 0) {
> > + ret = (int32_t)num / (int32_t)den;
> > + }
> > + env->psw_o = (ret == 0 || den == 0);
>
> That is not the correct overflow condition.
> ret == 0 is irrelevant, but INT_MIN / -1 is relevant.
>
Yes. It my mistake.
Fixed it.
>
> Finally, this was way too big to review properly. This patch set will need to
> be broken up into smaller pieces. Perhaps introducing just one, or a few
> related instructions with each patch.
>
Thanks.
I reworked your comment.
I will send v4 patch.
>
> r~
>
--
Yosinori Sato
- [Qemu-devel] [PATCH RFC v3 09/11] RX Target hardware definition, (continued)
- [Qemu-devel] [PATCH RFC v3 09/11] RX Target hardware definition, Yoshinori Sato, 2019/03/02
- [Qemu-devel] [PATCH RFC v3 11/11] MAINTAINERS: Add RX entry., Yoshinori Sato, 2019/03/02
- [Qemu-devel] [PATCH RFC v3 02/11] target/rx: TCG helper, Yoshinori Sato, 2019/03/02
- [Qemu-devel] [PATCH RFC v3 05/11] target/rx: miscellaneous functions, Yoshinori Sato, 2019/03/02
- Re: [Qemu-devel] [PATCH RFC v3 00/11] Add RX archtecture support, no-reply, 2019/03/02
- Re: [Qemu-devel] [PATCH RFC v3 00/11] Add RX archtecture support, Philippe Mathieu-Daudé, 2019/03/02
- Re: [Qemu-devel] [PATCH RFC v3 00/11] Add RX archtecture support, Richard Henderson, 2019/03/07
- Re: [Qemu-devel] [PATCH RFC v3 00/11] Add RX archtecture support,
Yoshinori Sato <=
- [Qemu-devel] [PATCH RFC v4 00/12] Add RX archtecture support, Yoshinori Sato, 2019/03/20
- [Qemu-devel] [PATCH RFC v4 10/12] Add rx-softmmu, Yoshinori Sato, 2019/03/20
- [Qemu-devel] [PATCH RFC v4 11/12] MAINTAINERS: Add RX, Yoshinori Sato, 2019/03/20
- [Qemu-devel] [PATCH RFC v4 06/12] hw/intc: RX62N interrupt controller (ICUa), Yoshinori Sato, 2019/03/20
- [Qemu-devel] [PATCH RFC v4 12/12] include/hw/regiserfields.h: Add 8bit and 16bit registers, Yoshinori Sato, 2019/03/20
- [Qemu-devel] [PATCH RFC v4 05/12] target/rx: Miscellaneous files, Yoshinori Sato, 2019/03/20
- [Qemu-devel] [PATCH RFC v4 04/12] target/rx: RX disassembler, Yoshinori Sato, 2019/03/20