[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v4 19/45] target/arm: Implement SME MOVA
From: |
Richard Henderson |
Subject: |
Re: [PATCH v4 19/45] target/arm: Implement SME MOVA |
Date: |
Mon, 4 Jul 2022 14:38:30 +0530 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 |
On 7/1/22 21:49, Peter Maydell wrote:
On Tue, 28 Jun 2022 at 05:40, Richard Henderson
<richard.henderson@linaro.org> wrote:
We can reuse the SVE functions for implementing moves to/from
horizontal tile slices, but we need new ones for moves to/from
vertical tile slices.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
diff --git a/target/arm/sme_helper.c b/target/arm/sme_helper.c
index eef2df73e1..95159862de 100644
--- a/target/arm/sme_helper.c
+++ b/target/arm/sme_helper.c
@@ -19,8 +19,10 @@
#include "qemu/osdep.h"
#include "cpu.h"
-#include "internals.h"
Did you mean to delete this #include line ?
I meant to go back and remove it from whence it came, but kept forgetting.
+void HELPER(sme_mova_cz_d)(void *za, void *vn, void *vg, uint32_t desc)
+{
+ int i, oprsz = simd_oprsz(desc) / 8;
+ uint8_t *pg = vg;
+ uint64_t *n = vn;
+ uint64_t *a = za;
+
+ for (i = 0; i < oprsz; i++) {
+ if (pg[H1_2(i)] & 1) {
The documentation of the H macros says
"The H1_<N> macros are used when performing byte arithmetic and then
casting the final pointer to a type of size N."
but we're not casting anything to a 2-byte type, so what's happening here?
Yep, error.
+void HELPER(sme_mova_cz_q)(void *za, void *vn, void *vg, uint32_t desc)
+{
+ int i, oprsz = simd_oprsz(desc) / 16;
+ uint16_t *pg = vg;
+ Int128 *n = vn;
+ Int128 *a = za;
+
+ for (i = 0; i < oprsz; i++) {
+ if (pg[H2(i)] & 1) {
+ a[i * sizeof(ARMVectorReg)] = n[i];
Is it really OK to do this with an Int128 store? That is
in host-order, but the two halves of a 128-bit quantity
in the ZA array are in architectural order. I suppose the
source also will have them in the architectural order, but
it does look odd, especially uncommented.
Would memcpy be better for you?
+ /* Resolve tile.size[index] to an untyped ZA slice index. */
+ t_index = tcg_temp_new_i32();
+ tcg_gen_trunc_tl_i32(t_index, cpu_reg(s, rs));
+ tcg_gen_addi_i32(t_index, t_index, index);
This code isn't doing what the comment says; it's just calculating
the (not-yet-taken-MOD-dim) slice index, which does depend on the type.
I guess the comment applies to a larger section than just these two lines.
+
+ len = ctz32(streaming_vec_reg_size(s)) - esz;
What is this the length of ?
The length of the extract, aka the mod.
+ /* The tile slice offset within env->zarray is the column offset. */
+ offset = tile;
I don't understand why we can just add the tile index
(which is going to be an integer 0, 1, 2..) to a byte offset.
In the horizontal case we add tile * sizeof(ARMVectorReg),
which makes more sense to me.
Hmm. I think you're right this should be tile * column width, or
offset = tile << esz;
I wish I could compare vs FVP...
+ /*
+ * For big-endian, adjust the column slice offset within
+ * the uint64_t host words that make up env->zarray.
+ * This must wait until index and offset are combined.
Why? Neither the byte-offset of the start of the tile nor
the byte offset of zarray in CPUARMState ought to be non-8-aligned.
Columns will not be 8-aligned. On page 38 of 0616A.a, see the illustration of ZA0V.B[22],
which is 6 mod 8.
I'll try and improve the commentary.
r~