qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 4/5] target-tricore: Add instructions of BIT opc


From: Richard Henderson
Subject: Re: [Qemu-devel] [PATCH 4/5] target-tricore: Add instructions of BIT opcode format
Date: Sat, 27 Sep 2014 22:22:25 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1

On 09/27/2014 07:58 AM, Bastian Koppelmann wrote:
> +/* D[c] = D[c][0] op1 (D[a][pos1] op2 D[b][pos2]);*/
> +static inline void gen_bit_2op(TCGv ret, TCGv r1, TCGv r2, TCGv r3,
> +                               int pos1, int pos2,
> +                               void(*op1)(TCGv, TCGv, TCGv),
> +                               void(*op2)(TCGv, TCGv, TCGv))
> +{
> +    TCGv temp1, temp2, temp3;
> +
> +    temp1 = tcg_temp_new();
> +    temp2 = tcg_temp_new();
> +    temp3 = tcg_temp_new();
> +
> +    tcg_gen_andi_tl(temp3, r3, 0x1);
> +
> +    tcg_gen_andi_tl(temp2, r2 , (0x1u << pos2));
> +    tcg_gen_shri_tl(temp2, temp2, pos2);
> +
> +    tcg_gen_andi_tl(temp1, r1, (0x1u << pos1));
> +    tcg_gen_shri_tl(temp1, temp1, pos1);
> +
> +    (*op1)(temp1, temp1, temp2);
> +    (*op2)(ret , temp3, temp1);

This incorrectly clobbers bits 1:31 of ret.  You want

  shri tmp1, r2, pos2
  shri tmp1, r1, pos1
  op1(tmp1, tmp1, tmp2)
  op2(tmp1, r3, tmp1)
  deposit ret, ret, tmp1, 0, 1


> +    TCGv temp1, temp2;
> +
> +    temp1 = tcg_temp_new();
> +    temp2 = tcg_temp_new();
> +
> +    tcg_gen_andi_tl(temp2, r2, (0x1u << pos2));
> +    tcg_gen_shri_tl(temp2, temp2, pos2);
> +
> +    tcg_gen_andi_tl(temp1, r1, (0x1u << pos1));
> +    tcg_gen_shri_tl(temp1, temp1, pos1);
> +
> +    (*op1)(ret, temp1, temp2);

This one, though, does get to set the whole register.  That said,
I think you should *not* mask the two inputs, but instead mask the one output.
 That saves one operation, and allows NOR to not need a special case.

> +    case OPC2_32_BIT_AND_NOR_T:
> +        gen_bit_2op(temp, cpu_gpr_d[r1], cpu_gpr_d[r2], cpu_gpr_d[r3],
> +                    pos1, pos2, &tcg_gen_or_tl, &tcg_gen_andc_tl);
> +        break;

Without trying to take into account the properties of the tcg backend, this
seems less than ideal.  Yes, it's correct, but so is

        tcg_gen_nor_tl, tcg_gen_and_tl

which matches the name of the instruction.

If the tcg backend is sparc or ppc, it's more efficient too, since nor is
actually present in the isa.  But of course, arm and even haswell x86 have andc
but don't have nor.  This is stuff that a normal compiler optimizer could sort
out, but we don't have that for tcg.

I'd be willing to accept something conditionalized on TCG_TARGET_HAS_nor_i32,
if you like, but otherwise just match the instruction.

> +static void decode_bit_insert(CPUTriCoreState *env, DisasContext *ctx)

It's probably better to implement this with one right-shift and one deposit.
Certainly would be easier to read and follow.

> +    case OPC2_32_BIT_XNOR_T:
> +        gen_bit_1op(cpu_gpr_d[r3], cpu_gpr_d[r1], cpu_gpr_d[r2],
> +                    pos1, pos2, &tcg_gen_xor_tl);

tcg_gen_eqv_tl

> +    case OPC2_32_BIT_OR_NOR_T:
> +        gen_bit_2op(temp, cpu_gpr_d[r1], cpu_gpr_d[r2], cpu_gpr_d[r3],
> +                    pos1, pos2, &tcg_gen_or_tl, &tcg_gen_orc_tl);
> +        break;

Again, probably better with nor + or, or conditionalization.

> +    case OPC2_32_BIT_SH_XNOR_T:
> +        gen_bit_1op(temp, cpu_gpr_d[r1], cpu_gpr_d[r2],
> +                    pos1, pos2, &tcg_gen_xor_tl);
> +        tcg_gen_not_tl(temp, temp);

Again, eqv.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]