qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 19/45] target/arm: Implement SME MOVA


From: Richard Henderson
Subject: Re: [PATCH v4 19/45] target/arm: Implement SME MOVA
Date: Mon, 4 Jul 2022 14:38:30 +0530
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1

On 7/1/22 21:49, Peter Maydell wrote:
On Tue, 28 Jun 2022 at 05:40, Richard Henderson
<richard.henderson@linaro.org> wrote:

We can reuse the SVE functions for implementing moves to/from
horizontal tile slices, but we need new ones for moves to/from
vertical tile slices.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---

diff --git a/target/arm/sme_helper.c b/target/arm/sme_helper.c
index eef2df73e1..95159862de 100644
--- a/target/arm/sme_helper.c
+++ b/target/arm/sme_helper.c
@@ -19,8 +19,10 @@

  #include "qemu/osdep.h"
  #include "cpu.h"
-#include "internals.h"

Did you mean to delete this #include line ?

I meant to go back and remove it from whence it came, but kept forgetting.

+void HELPER(sme_mova_cz_d)(void *za, void *vn, void *vg, uint32_t desc)
+{
+    int i, oprsz = simd_oprsz(desc) / 8;
+    uint8_t *pg = vg;
+    uint64_t *n = vn;
+    uint64_t *a = za;
+
+    for (i = 0; i < oprsz; i++) {
+        if (pg[H1_2(i)] & 1) {

The documentation of the H macros says
"The H1_<N> macros are used when performing byte arithmetic and then
  casting the final pointer to a type of size N."
but we're not casting anything to a 2-byte type, so what's happening here?

Yep, error.

+void HELPER(sme_mova_cz_q)(void *za, void *vn, void *vg, uint32_t desc)
+{
+    int i, oprsz = simd_oprsz(desc) / 16;
+    uint16_t *pg = vg;
+    Int128 *n = vn;
+    Int128 *a = za;
+
+    for (i = 0; i < oprsz; i++) {
+        if (pg[H2(i)] & 1) {
+            a[i * sizeof(ARMVectorReg)] = n[i];

Is it really OK to do this with an Int128 store? That is
in host-order, but the two halves of a 128-bit quantity
in the ZA array are in architectural order. I suppose the
source also will have them in the architectural order, but
it does look odd, especially uncommented.

Would memcpy be better for you?

+    /* Resolve tile.size[index] to an untyped ZA slice index. */
+    t_index = tcg_temp_new_i32();
+    tcg_gen_trunc_tl_i32(t_index, cpu_reg(s, rs));
+    tcg_gen_addi_i32(t_index, t_index, index);

This code isn't doing what the comment says; it's just calculating
the (not-yet-taken-MOD-dim) slice index, which does depend on the type.

I guess the comment applies to a larger section than just these two lines.


+
+    len = ctz32(streaming_vec_reg_size(s)) - esz;

What is this the length of ?

The length of the extract, aka the mod.

+        /* The tile slice offset within env->zarray is the column offset. */
+        offset = tile;

I don't understand why we can just add the tile index
(which is going to be an integer 0, 1, 2..) to a byte offset.
In the horizontal case we add tile * sizeof(ARMVectorReg),
which makes more sense to me.

Hmm.  I think you're right this should be tile * column width, or

  offset = tile << esz;

I wish I could compare vs FVP...

+        /*
+         * For big-endian, adjust the column slice offset within
+         * the uint64_t host words that make up env->zarray.
+         * This must wait until index and offset are combined.

Why? Neither the byte-offset of the start of the tile nor
the byte offset of zarray in CPUARMState ought to be non-8-aligned.

Columns will not be 8-aligned. On page 38 of 0616A.a, see the illustration of ZA0V.B[22], which is 6 mod 8.

I'll try and improve the commentary.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]