[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 2/3] target/riscv: update Zb[abcs] to 1.0.0 (public review
From: |
Richard Henderson |
Subject: |
Re: [PATCH v2 2/3] target/riscv: update Zb[abcs] to 1.0.0 (public review) specification |
Date: |
Wed, 18 Aug 2021 15:09:33 -1000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 |
On 8/18/21 10:32 AM, Philipp Tomsich wrote:
The ratification package for Zb[abcs] does not contain all instructions
that have been added to QEmu and don't define misa.B for these: the
individual extensions are now Zba, Zbb, Zbc and Zbs.
Some of the instructions that had previously been added and now need to
be dropped are:
- shift-one instructions
- generalized reverse and or-combine
- w-forms of single-bit instructions
- w-form of rev8
Do not try to do this all in one patch. It's too large to review that way.
The following have been adjusted:
- rori and slli.uw only accept a 6-bit shamt field
(if the bit that is reserved for a future 7-bit shamt for RV128 is
set, the encoding is illegal on RV64)
The gen_shifti helper should be taking care of testing that the shamt is in range. You
really should match the base shift instructions here.
-static bool trans_grevi(DisasContext *ctx, arg_grevi *a)
+static void gen_orc_b(TCGv ret, TCGv source1)
{
- REQUIRE_EXT(ctx, RVB);
-
- if (a->shamt >= TARGET_LONG_BITS) {
- return false;
- }
-
- return gen_grevi(ctx, a);
+ TCGv tmp = tcg_temp_new();
+ tcg_gen_andi_tl(tmp, source1, (TARGET_LONG_BITS == 64) ?
0x5555555555555555LL
+ : 0x55555555);
+ tcg_gen_shli_tl(tmp, tmp, 1);
+ tcg_gen_or_tl(source1, source1, tmp);
+ tcg_gen_andi_tl(tmp, source1, (TARGET_LONG_BITS == 64) ?
0xaaaaaaaaaaaaaaaaLL
+ : 0xaaaaaaaa);
+ tcg_gen_shri_tl(tmp, tmp, 1);
+ tcg_gen_or_tl(source1, source1, tmp);
+ tcg_gen_andi_tl(tmp, source1, (TARGET_LONG_BITS == 64) ?
0x3333333333333333LL
+ : 0x33333333);
+ tcg_gen_shli_tl(tmp, tmp, 2);
+ tcg_gen_or_tl(source1, source1, tmp);
+ tcg_gen_andi_tl(tmp, source1, (TARGET_LONG_BITS == 64) ?
0xccccccccccccccccLL
+ : 0xcccccccc);
+ tcg_gen_shri_tl(tmp, tmp, 2);
+ tcg_gen_or_tl(source1, source1, tmp);
+ tcg_gen_andi_tl(tmp, source1, (TARGET_LONG_BITS == 64) ?
0x0f0f0f0f0f0f0f0fLL
+ : 0x0f0f0f0f);
+ tcg_gen_shli_tl(tmp, tmp, 4);
+ tcg_gen_or_tl(source1, source1, tmp);
+ tcg_gen_andi_tl(tmp, source1, (TARGET_LONG_BITS == 64) ?
0xf0f0f0f0f0f0f0f0LL
+ : 0xf0f0f0f0);
+ tcg_gen_shri_tl(tmp, tmp, 4);
+ tcg_gen_or_tl(ret, source1, tmp);
}
You can use the simpler algorithm from
https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord
/* Set msb in each byte if the byte was zero. */
tcg_gen_subi_tl(tmp, src1, dup_const(MO_8, 0x01));
tcg_gen_andc_tl(tmp, tmp, src1);
tcg_gen_andi_tl(tmp, tmp, dup_const(MO_8, 0x80));
/* Replicate the msb of each byte across the byte. */
tcg_gen_shri_tl(tmp, tmp, 7);
tcg_gen_muli_tl(dest, tmp, 0xff);
+static void gen_clmulx(DisasContext *ctx, arg_r *a, bool reverse)
+{
+ TCGv source1 = tcg_temp_new();
+ TCGv source2 = tcg_temp_new();
+ TCGv zeroreg = tcg_const_tl(0);
+ TCGv t0 = tcg_temp_new();
+ TCGv t1 = tcg_temp_new();
+ TCGv result = tcg_temp_new();
+
+ gen_get_gpr(source1, a->rs1);
+ gen_get_gpr(source2, a->rs2);
+ tcg_gen_movi_tl(result, 0);
+
+ for (int i = 0; i < TARGET_LONG_BITS; i++) {
+ tcg_gen_shri_tl(t0, source2, i);
+ if (reverse) {
+ tcg_gen_shri_tl(t1, source1, TARGET_LONG_BITS - i - 1);
+ } else {
+ tcg_gen_shli_tl(t1, source1, i);
+ }
+ tcg_gen_andi_tl(t0, t0, 1);
+ tcg_gen_xor_tl(t1, result, t1);
+ tcg_gen_movcond_tl(TCG_COND_NE, result, t0, zeroreg, t1, result);
+ }
+
+ gen_set_gpr(a->rd, result);
+ tcg_temp_free(source1);
+ tcg_temp_free(source2);
+ tcg_temp_free(t0);
+ tcg_temp_free(t1);
+ tcg_temp_free(zeroreg);
+ tcg_temp_free(result);
+}
This inline is way too large -- up to 384 instructions.
Use a couple of out-of-line helpers.
r~