Re: [PATCH v3 04/37] target/ppc: vmulh* instructions use gvec

qemu-ppc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 04/37] target/ppc: vmulh* instructions use gvec

From:	Richard Henderson
Subject:	Re: [PATCH v3 04/37] target/ppc: vmulh* instructions use gvec
Date:	Fri, 11 Feb 2022 14:51:34 +1100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0

On 2/10/22 23:34, matheus.ferst@eldorado.org.br wrote:

+static void do_vx_vmulhu_vec(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec a1, b1, mask, w, k;
+    unsigned bits;
+    bits = (vece == MO_32) ? 16 : 32;
+
+    a1 = tcg_temp_new_vec_matching(t);
+    b1 = tcg_temp_new_vec_matching(t);
+    w  = tcg_temp_new_vec_matching(t);
+    k  = tcg_temp_new_vec_matching(t);
+    mask = tcg_temp_new_vec_matching(t);
+
+    tcg_gen_dupi_vec(vece, mask, (vece == MO_32) ? 0xFFFF : 0xFFFFFFFF);
+    tcg_gen_and_vec(vece, a1, a, mask);
+    tcg_gen_and_vec(vece, b1, b, mask);
+    tcg_gen_mul_vec(vece, t, a1, b1);
+    tcg_gen_shri_vec(vece, k, t, bits);
+
+    tcg_gen_shri_vec(vece, a1, a, bits);
+    tcg_gen_mul_vec(vece, t, a1, b1);
+    tcg_gen_add_vec(vece, t, t, k);
+    tcg_gen_and_vec(vece, k, t, mask);
+    tcg_gen_shri_vec(vece, w, t, bits);
+
+    tcg_gen_and_vec(vece, a1, a, mask);
+    tcg_gen_shri_vec(vece, b1, b, bits);
+    tcg_gen_mul_vec(vece, t, a1, b1);
+    tcg_gen_add_vec(vece, t, t, k);
+    tcg_gen_shri_vec(vece, k, t, bits);
+
+    tcg_gen_shri_vec(vece, a1, a, bits);
+    tcg_gen_mul_vec(vece, t, a1, b1);
+    tcg_gen_add_vec(vece, t, t, w);
+    tcg_gen_add_vec(vece, t, t, k);

I don't think that you should decompose 4 high-part 32-bit multiplies into 4 32-bitmultiplies plus lots of arithmetic. This is not a win. You're actually better off withpure integer arithmetic here.

You could instead widen these into 2 64-bit multiplies, plus some arithmetic. That'scertainly closer to the break-even point.

+        {
+            .fniv = do_vx_vmulhu_vec,
+            .fno  = gen_helper_VMULHUD,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+    };

As for the two high-part 64-bit multiplies, I think that should definitely remain aninteger operation.


You probably want to expand these with inline integer operations using .fni[48].

+static void do_vx_vmulhs_vec(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)


Very much likewise.


r~

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH v3 00/37] target/ppc: PowerISA Vector/VSX instruction batch, matheus . ferst, 2022/02/10
- [PATCH v3 07/37] target/ppc: Move vexts[bhw]2[wd] to decodetree, matheus . ferst, 2022/02/10
  - Re: [PATCH v3 07/37] target/ppc: Move vexts[bhw]2[wd] to decodetree, Richard Henderson, 2022/02/10
- [PATCH v3 02/37] target/ppc: moved vector even and odd multiplication to decodetree, matheus . ferst, 2022/02/10
  - Re: [PATCH v3 02/37] target/ppc: moved vector even and odd multiplication to decodetree, Richard Henderson, 2022/02/10
- [PATCH v3 04/37] target/ppc: vmulh* instructions use gvec, matheus . ferst, 2022/02/10
  - Re: [PATCH v3 04/37] target/ppc: vmulh* instructions use gvec, Richard Henderson <=
- [PATCH v3 06/37] target/ppc: Implement vmsumudm instruction, matheus . ferst, 2022/02/10
  - Re: [PATCH v3 06/37] target/ppc: Implement vmsumudm instruction, Richard Henderson, 2022/02/10
- [PATCH v3 01/37] target/ppc: Introduce TRANS*FLAGS macros, matheus . ferst, 2022/02/10
- [PATCH v3 05/37] target/ppc: Implement vmsumcud instruction, matheus . ferst, 2022/02/10
  - Re: [PATCH v3 05/37] target/ppc: Implement vmsumcud instruction, Richard Henderson, 2022/02/10
- [PATCH v3 03/37] target/ppc: Moved vector multiply high and low to decodetree, matheus . ferst, 2022/02/10
  - Re: [PATCH v3 03/37] target/ppc: Moved vector multiply high and low to decodetree, Richard Henderson, 2022/02/10
- [PATCH v3 08/37] target/ppc: Implement vextsd2q, matheus . ferst, 2022/02/10
  - Re: [PATCH v3 08/37] target/ppc: Implement vextsd2q, Richard Henderson, 2022/02/10
- [PATCH v3 09/37] target/ppc: Move Vector Compare Equal/Not Equal/Greater Than to decodetree, matheus . ferst, 2022/02/10

Prev by Date: Re: [PATCH v3 03/37] target/ppc: Moved vector multiply high and low to decodetree
Next by Date: Re: [PATCH v3 05/37] target/ppc: Implement vmsumcud instruction
Previous by thread: [PATCH v3 04/37] target/ppc: vmulh* instructions use gvec
Next by thread: [PATCH v3 06/37] target/ppc: Implement vmsumudm instruction
Index(es):
- Date
- Thread