qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 02/20] Hexagon HVX (target/hexagon) add Hexagon Vector eXtens


From: Richard Henderson
Subject: Re: [PATCH 02/20] Hexagon HVX (target/hexagon) add Hexagon Vector eXtensions (HVX) to core
Date: Sun, 25 Jul 2021 03:08:07 -1000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0

On 7/5/21 1:34 PM, Taylor Simpson wrote:
HVX is a set of wide vector instructions.  Machine state includes
     vector registers (VRegs)
     vector predicate registers (QRegs)
     temporary registers for intermediate values
     store buffer (masked stores and scatter/gather)

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
  target/hexagon/cpu.h            | 35 ++++++++++++++++-
  target/hexagon/hex_arch_types.h |  5 +++
  target/hexagon/insn.h           |  3 ++
  target/hexagon/internal.h       |  3 ++
  target/hexagon/mmvec/mmvec.h    | 83 +++++++++++++++++++++++++++++++++++++++++
  target/hexagon/cpu.c            | 72 ++++++++++++++++++++++++++++++++++-
  6 files changed, 198 insertions(+), 3 deletions(-)
  create mode 100644 target/hexagon/mmvec/mmvec.h

diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index 2855dd3..0b377c3 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -26,6 +26,7 @@ typedef struct CPUHexagonState CPUHexagonState;
  #include "qemu-common.h"
  #include "exec/cpu-defs.h"
  #include "hex_regs.h"
+#include "mmvec/mmvec.h"
#define NUM_PREGS 4
  #define TOTAL_PER_THREAD_REGS 64
@@ -34,6 +35,7 @@ typedef struct CPUHexagonState CPUHexagonState;
  #define STORES_MAX 2
  #define REG_WRITES_MAX 32
  #define PRED_WRITES_MAX 5                   /* 4 insns + endloop */
+#define VSTORES_MAX 2
#define TYPE_HEXAGON_CPU "hexagon-cpu" @@ -52,6 +54,13 @@ typedef struct {
      uint64_t data64;
  } MemLog;
+typedef struct {
+    target_ulong va;
+    int size;
+    MMVector mask QEMU_ALIGNED(16);
+    MMVector data QEMU_ALIGNED(16);
+} VStoreLog;

Do you really need a MMVector mask, or should this be a QRegMask?

-    target_ulong gather_issued;
+    bool gather_issued;

Surely unrelated to adding state.

+    MMVector VRegs[NUM_VREGS] QEMU_ALIGNED(16);
+    MMVector future_VRegs[NUM_VREGS] QEMU_ALIGNED(16);
+    MMVector tmp_VRegs[NUM_VREGS] QEMU_ALIGNED(16);

Ok, this is where I'm not keen on how you handle this for integer code, and for vector code has got to be past the realm of acceptable.

You have exactly 4 slots in your vliw packet. You cannot possibly use 32 future or tmp slots. For integers this wastage was at least small, but for vectors these waste just shy of 8k.

All you need to do is to be smarter about mapping e.g. 5 to 8 temp slots during 
translation.

+    MMQReg QRegs[NUM_QREGS] QEMU_ALIGNED(16);
+    MMQReg future_QRegs[NUM_QREGS] QEMU_ALIGNED(16);

Likewise.

+    /* Temporaries used within instructions */
+    MMVector zero_vector QEMU_ALIGNED(16);

You must be doing something wrong to need zero in memory.

+/*
+ * The HVX register dump takes up a ton of space in the log
+ * Don't print it unless it is needed
+ */
+#define DUMP_HVX 0
+#if DUMP_HVX
+    qemu_fprintf(f, "Vector Registers = {\n");
+    for (int i = 0; i < NUM_VREGS; i++) {
+        print_vreg(f, env, i);
+    }
+    for (int i = 0; i < NUM_QREGS; i++) {
+        print_qreg(f, env, i);
+    }
+    qemu_fprintf(f, "}\n");
+#endif

Use CPU_DUMP_FPU, controlled by -d fpu.

-    cc->gdb_num_core_regs = TOTAL_PER_THREAD_REGS;
+    cc->gdb_num_core_regs = TOTAL_PER_THREAD_REGS + NUM_VREGS + NUM_QREGS;

They're not really core regs though are they?
Surely gdb_register_coprocessor is a better representation.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]