[PULL 25/27] target/arm: Optimize MVE VSHLL and VMOVL

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PULL 25/27] target/arm: Optimize MVE VSHLL and VMOVL

From:	Peter Maydell
Subject:	[PULL 25/27] target/arm: Optimize MVE VSHLL and VMOVL
Date:	Mon, 20 Sep 2021 15:19:45 +0100

Optimize the MVE VSHLL insns by using TCG vector ops when possible.
This includes the VMOVL insn, which we handle in mve.decode as "VSHLL
with zero shift count".

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210913095440.13462-11-peter.maydell@linaro.org
---
 target/arm/translate-mve.c | 67 +++++++++++++++++++++++++++++++++-----
 1 file changed, 59 insertions(+), 8 deletions(-)

diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 00fa4379a74..5d66f70657e 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -1735,16 +1735,67 @@ DO_2SHIFT_SCALAR(VQSHL_U_scalar, vqshli_u)
 DO_2SHIFT_SCALAR(VQRSHL_S_scalar, vqrshli_s)
 DO_2SHIFT_SCALAR(VQRSHL_U_scalar, vqrshli_u)
 
-#define DO_VSHLL(INSN, FN)                                      \
-    static bool trans_##INSN(DisasContext *s, arg_2shift *a)    \
-    {                                                           \
-        static MVEGenTwoOpShiftFn * const fns[] = {             \
-            gen_helper_mve_##FN##b,                             \
-            gen_helper_mve_##FN##h,                             \
-        };                                                      \
-        return do_2shift(s, a, fns[a->size], false);            \
+#define DO_VSHLL(INSN, FN)                                              \
+    static bool trans_##INSN(DisasContext *s, arg_2shift *a)            \
+    {                                                                   \
+        static MVEGenTwoOpShiftFn * const fns[] = {                     \
+            gen_helper_mve_##FN##b,                                     \
+            gen_helper_mve_##FN##h,                                     \
+        };                                                              \
+        return do_2shift_vec(s, a, fns[a->size], false, do_gvec_##FN);  \
     }
 
+/*
+ * For the VSHLL vector helpers, the vece is the size of the input
+ * (ie MO_8 or MO_16); the helpers want to work in the output size.
+ * The shift count can be 0..<input size>, inclusive. (0 is VMOVL.)
+ */
+static void do_gvec_vshllbs(unsigned vece, uint32_t dofs, uint32_t aofs,
+                            int64_t shift, uint32_t oprsz, uint32_t maxsz)
+{
+    unsigned ovece = vece + 1;
+    unsigned ibits = vece == MO_8 ? 8 : 16;
+    tcg_gen_gvec_shli(ovece, dofs, aofs, ibits, oprsz, maxsz);
+    tcg_gen_gvec_sari(ovece, dofs, dofs, ibits - shift, oprsz, maxsz);
+}
+
+static void do_gvec_vshllbu(unsigned vece, uint32_t dofs, uint32_t aofs,
+                            int64_t shift, uint32_t oprsz, uint32_t maxsz)
+{
+    unsigned ovece = vece + 1;
+    tcg_gen_gvec_andi(ovece, dofs, aofs,
+                      ovece == MO_16 ? 0xff : 0xffff, oprsz, maxsz);
+    tcg_gen_gvec_shli(ovece, dofs, dofs, shift, oprsz, maxsz);
+}
+
+static void do_gvec_vshllts(unsigned vece, uint32_t dofs, uint32_t aofs,
+                            int64_t shift, uint32_t oprsz, uint32_t maxsz)
+{
+    unsigned ovece = vece + 1;
+    unsigned ibits = vece == MO_8 ? 8 : 16;
+    if (shift == 0) {
+        tcg_gen_gvec_sari(ovece, dofs, aofs, ibits, oprsz, maxsz);
+    } else {
+        tcg_gen_gvec_andi(ovece, dofs, aofs,
+                          ovece == MO_16 ? 0xff00 : 0xffff0000, oprsz, maxsz);
+        tcg_gen_gvec_sari(ovece, dofs, dofs, ibits - shift, oprsz, maxsz);
+    }
+}
+
+static void do_gvec_vshlltu(unsigned vece, uint32_t dofs, uint32_t aofs,
+                            int64_t shift, uint32_t oprsz, uint32_t maxsz)
+{
+    unsigned ovece = vece + 1;
+    unsigned ibits = vece == MO_8 ? 8 : 16;
+    if (shift == 0) {
+        tcg_gen_gvec_shri(ovece, dofs, aofs, ibits, oprsz, maxsz);
+    } else {
+        tcg_gen_gvec_andi(ovece, dofs, aofs,
+                          ovece == MO_16 ? 0xff00 : 0xffff0000, oprsz, maxsz);
+        tcg_gen_gvec_shri(ovece, dofs, dofs, ibits - shift, oprsz, maxsz);
+    }
+}
+
 DO_VSHLL(VSHLL_BS, vshllbs)
 DO_VSHLL(VSHLL_BU, vshllbu)
 DO_VSHLL(VSHLL_TS, vshllts)
-- 
2.20.1

[Prev in Thread]

Current Thread

[Next in Thread]

[PULL 13/27] hvf: arm: Implement PSCI handling, (continued)
- [PULL 13/27] hvf: arm: Implement PSCI handling, Peter Maydell, 2021/09/20
- [PULL 15/27] hvf: arm: Add rudimentary PMC support, Peter Maydell, 2021/09/20
- [PULL 20/27] target/arm: Optimize MVE arithmetic ops, Peter Maydell, 2021/09/20
- [PULL 21/27] target/arm: Optimize MVE VNEG, VABS, Peter Maydell, 2021/09/20
- [PULL 24/27] target/arm: Optimize MVE VSHL, VSHR immediate forms, Peter Maydell, 2021/09/20
- [PULL 16/27] target/arm: Avoid goto_tb if we're trying to exit to the main loop, Peter Maydell, 2021/09/20
- [PULL 26/27] target/arm: Optimize MVE VSLI and VSRI, Peter Maydell, 2021/09/20
- [PULL 19/27] target/arm: Optimize MVE logic ops, Peter Maydell, 2021/09/20
- [PULL 17/27] target/arm: Enforce that FPDSCR.LTPSIZE is 4 on inbound migration, Peter Maydell, 2021/09/20
- [PULL 22/27] target/arm: Optimize MVE VDUP, Peter Maydell, 2021/09/20
- [PULL 25/27] target/arm: Optimize MVE VSHLL and VMOVL, Peter Maydell <=
- [PULL 02/27] elf2dmp: Fail cleanly if PDB file specifies zero block_size, Peter Maydell, 2021/09/20
- [PULL 14/27] arm: Add Hypervisor.framework build target, Peter Maydell, 2021/09/20
- [PULL 18/27] target/arm: Add TB flag for "MVE insns not predicated", Peter Maydell, 2021/09/20
- [PULL 27/27] target/arm: Optimize MVE 1op-immediate insns, Peter Maydell, 2021/09/20
- [PULL 23/27] target/arm: Optimize MVE VMVN, Peter Maydell, 2021/09/20

Prev by Date: [PULL 22/27] target/arm: Optimize MVE VDUP
Next by Date: [PULL 02/27] elf2dmp: Fail cleanly if PDB file specifies zero block_size
Previous by thread: [PULL 22/27] target/arm: Optimize MVE VDUP
Next by thread: [PULL 02/27] elf2dmp: Fail cleanly if PDB file specifies zero block_size
Index(es):
- Date
- Thread