qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 2/3] target/riscv: update Zb[abcs] to 1.0.0 (public review


From: Richard Henderson
Subject: Re: [PATCH v2 2/3] target/riscv: update Zb[abcs] to 1.0.0 (public review) specification
Date: Wed, 18 Aug 2021 15:09:33 -1000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0

On 8/18/21 10:32 AM, Philipp Tomsich wrote:
The ratification package for Zb[abcs] does not contain all instructions
that have been added to QEmu and don't define misa.B for these: the
individual extensions are now Zba, Zbb, Zbc and Zbs.

Some of the instructions that had previously been added and now need to
be dropped are:
  - shift-one instructions
  - generalized reverse and or-combine
  - w-forms of single-bit instructions
  - w-form of rev8


Do not try to do this all in one patch.  It's too large to review that way.

The following have been adjusted:
  - rori and slli.uw only accept a 6-bit shamt field
    (if the bit that is reserved for a future 7-bit shamt for RV128 is
     set, the encoding is illegal on RV64)

The gen_shifti helper should be taking care of testing that the shamt is in range. You really should match the base shift instructions here.


-static bool trans_grevi(DisasContext *ctx, arg_grevi *a)
+static void gen_orc_b(TCGv ret, TCGv source1)
 {
-    REQUIRE_EXT(ctx, RVB);
-
-    if (a->shamt >= TARGET_LONG_BITS) {
-        return false;
-    }
-
-    return gen_grevi(ctx, a);
+    TCGv  tmp = tcg_temp_new();
+    tcg_gen_andi_tl(tmp, source1, (TARGET_LONG_BITS == 64) ? 
0x5555555555555555LL
+                                                           : 0x55555555);
+    tcg_gen_shli_tl(tmp, tmp, 1);
+    tcg_gen_or_tl(source1, source1, tmp);
+    tcg_gen_andi_tl(tmp, source1, (TARGET_LONG_BITS == 64) ? 
0xaaaaaaaaaaaaaaaaLL
+                                                           : 0xaaaaaaaa);
+    tcg_gen_shri_tl(tmp, tmp, 1);
+    tcg_gen_or_tl(source1, source1, tmp);
+    tcg_gen_andi_tl(tmp, source1, (TARGET_LONG_BITS == 64) ? 
0x3333333333333333LL
+                                                           : 0x33333333);
+    tcg_gen_shli_tl(tmp, tmp, 2);
+    tcg_gen_or_tl(source1, source1, tmp);
+    tcg_gen_andi_tl(tmp, source1, (TARGET_LONG_BITS == 64) ? 
0xccccccccccccccccLL
+                                                           : 0xcccccccc);
+    tcg_gen_shri_tl(tmp, tmp, 2);
+    tcg_gen_or_tl(source1, source1, tmp);
+    tcg_gen_andi_tl(tmp, source1, (TARGET_LONG_BITS == 64) ? 
0x0f0f0f0f0f0f0f0fLL
+                                                           : 0x0f0f0f0f);
+    tcg_gen_shli_tl(tmp, tmp, 4);
+    tcg_gen_or_tl(source1, source1, tmp);
+    tcg_gen_andi_tl(tmp, source1, (TARGET_LONG_BITS == 64) ? 
0xf0f0f0f0f0f0f0f0LL
+                                                           : 0xf0f0f0f0);
+    tcg_gen_shri_tl(tmp, tmp, 4);
+    tcg_gen_or_tl(ret, source1, tmp);
 }

You can use the simpler algorithm from
 https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord

  /* Set msb in each byte if the byte was zero. */
  tcg_gen_subi_tl(tmp, src1, dup_const(MO_8, 0x01));
  tcg_gen_andc_tl(tmp, tmp, src1);
  tcg_gen_andi_tl(tmp, tmp, dup_const(MO_8, 0x80));
  /* Replicate the msb of each byte across the byte. */
  tcg_gen_shri_tl(tmp, tmp, 7);
  tcg_gen_muli_tl(dest, tmp, 0xff);



+static void gen_clmulx(DisasContext *ctx, arg_r *a, bool reverse)
+{
+    TCGv source1 = tcg_temp_new();
+    TCGv source2 = tcg_temp_new();
+    TCGv zeroreg = tcg_const_tl(0);
+    TCGv t0 = tcg_temp_new();
+    TCGv t1 = tcg_temp_new();
+    TCGv result = tcg_temp_new();
+
+    gen_get_gpr(source1, a->rs1);
+    gen_get_gpr(source2, a->rs2);
+    tcg_gen_movi_tl(result, 0);
+
+    for (int i = 0; i < TARGET_LONG_BITS; i++) {
+        tcg_gen_shri_tl(t0, source2, i);
+        if (reverse) {
+            tcg_gen_shri_tl(t1, source1, TARGET_LONG_BITS - i - 1);
+        } else {
+            tcg_gen_shli_tl(t1, source1, i);
+        }
+        tcg_gen_andi_tl(t0, t0, 1);
+        tcg_gen_xor_tl(t1, result, t1);
+        tcg_gen_movcond_tl(TCG_COND_NE, result, t0, zeroreg, t1, result);
+    }
+
+    gen_set_gpr(a->rd, result);
+    tcg_temp_free(source1);
+    tcg_temp_free(source2);
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+    tcg_temp_free(zeroreg);
+    tcg_temp_free(result);
+}

This inline is way too large -- up to 384 instructions.
Use a couple of out-of-line helpers.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]