Re: [PATCH v5 57/60] target/riscv: vector slide instructions

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 57/60] target/riscv: vector slide instructions

From:	LIU Zhiwei
Subject:	Re: [PATCH v5 57/60] target/riscv: vector slide instructions
Date:	Mon, 16 Mar 2020 16:04:49 +0800
User-agent:	Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0



On 2020/3/15 13:16, Richard Henderson wrote:

On 3/12/20 7:58 AM, LIU Zhiwei wrote:

+#define GEN_VEXT_VSLIDEUP_VX(NAME, ETYPE, H, CLEAR_FN)                    \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
+        CPURISCVState *env, uint32_t desc)                                \
+{                                                                         \
+    uint32_t mlen = vext_mlen(desc);                                      \
+    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
+    uint32_t vm = vext_vm(desc);                                          \
+    uint32_t vl = env->vl;                                                \
+    uint32_t offset = s1, i;                                              \
+                                                                          \
+    if (offset > vl) {                                                    \
+        offset = vl;                                                      \
+    }                                                                     \

This isn't right.

+    for (i = 0; i < vl; i++) {                                            \
+        if (((i < offset)) || (!vm && !vext_elem_mask(v0, mlen, i))) {    \
+            continue;                                                     \
+        }                                                                 \
+        *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - offset));          \
+    }                                                                     \
+    if (i == 0) {                                                         \
+        return;                                                           \
+    }                                                                     \

You need to eliminate vl == 0 first, not last.
Then

     for (i = offset; i < vl; i++)

The types of i and vl need to be extended to target_ulong, so that you don't
incorrectly crop the input offset.

It may be worth special-casing vm=1, or hoisting it out of the loop.  The
operation becomes a memcpy (at least for little-endian) at that point.  See
swap_memmove in arm/sve_helper.c.

+#define GEN_VEXT_VSLIDEDOWN_VX(NAME, ETYPE, H, CLEAR_FN)                  \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
+        CPURISCVState *env, uint32_t desc)                                \
+{                                                                         \
+    uint32_t mlen = vext_mlen(desc);                                      \
+    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
+    uint32_t vm = vext_vm(desc);                                          \
+    uint32_t vl = env->vl;                                                \
+    uint32_t offset = s1, i;                                              \
+                                                                          \
+    for (i = 0; i < vl; i++) {                                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
+            continue;                                                     \
+        }                                                                 \
+        if (i + offset < vlmax) {                                         \
+            *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i + offset));      \

Again, eliminate vl == 0 first.  In fact, why don't we make that a global
request for all of the patches for the next revision.  Checking for i == 0 last
is silly, and checks for the zero twice: once in the loop bounds and again at
the end.

It is probably worth changing the loop bounds to

     if (offset >= vlmax) {
        max = 0;
     } else {
        max = MIN(vl, vlmax - offset);
     }
     for (i = 0; i < max; ++i)

+        } else {                                                          \
+            *((ETYPE *)vd + H(i)) = 0;                                    \
+        }

Which lets these zeros merge into...

+    for (; i < vlmax; i++) {                                              \
+        CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));      \
+    }                                                                     \

These zeros.

+#define GEN_VEXT_VSLIDE1UP_VX(NAME, ETYPE, H, CLEAR_FN)                   \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
+        CPURISCVState *env, uint32_t desc)                                \
+{                                                                         \
+    uint32_t mlen = vext_mlen(desc);                                      \
+    uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;                   \
+    uint32_t vm = vext_vm(desc);                                          \
+    uint32_t vl = env->vl;                                                \
+    uint32_t i;                                                           \
+                                                                          \
+    for (i = 0; i < vl; i++) {                                            \
+        if (!vm && !vext_elem_mask(v0, mlen, i)) {                        \
+            continue;                                                     \
+        }                                                                 \
+        if (i == 0) {                                                     \
+            *((ETYPE *)vd + H(i)) = s1;                                   \
+        } else {                                                          \
+            *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - 1));           \
+        }                                                                 \
+    }                                                                     \
+    if (i == 0) {                                                         \
+        return;                                                           \
+    }                                                                     \
+    for (; i < vlmax; i++) {                                              \
+        CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));      \
+    }                                                                     \
+}

As a preference, I think you can do away with this helper.
Simply use the slideup helper with argument 1, and then
afterwards store the integer register into element 0.  You should be able to
re-use code from vmv.s.x for that.

When I try it, I find it is some difficult, because  vmv.s.x will clean
the elements (0 < index < VLEN/SEW).

Zhiwei

+#define GEN_VEXT_VSLIDE1DOWN_VX(NAME, ETYPE, H, CLEAR_FN)                 \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
+        CPURISCVState *env, uint32_t desc)                                \
+{                                                                         \

Likewise.


r~

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH v5 56/60] target/riscv: floating-point scalar move instructions, (continued)
- [PATCH v5 56/60] target/riscv: floating-point scalar move instructions, LIU Zhiwei, 2020/03/12
  - Re: [PATCH v5 56/60] target/riscv: floating-point scalar move instructions, Richard Henderson, 2020/03/15
    - Re: [PATCH v5 56/60] target/riscv: floating-point scalar move instructions, LIU Zhiwei, 2020/03/15
    - Re: [PATCH v5 56/60] target/riscv: floating-point scalar move instructions, Richard Henderson, 2020/03/15
    - Re: [PATCH v5 56/60] target/riscv: floating-point scalar move instructions, LIU Zhiwei, 2020/03/17
    - Re: [PATCH v5 56/60] target/riscv: floating-point scalar move instructions, Richard Henderson, 2020/03/17
- [PATCH v5 57/60] target/riscv: vector slide instructions, LIU Zhiwei, 2020/03/12
  - Re: [PATCH v5 57/60] target/riscv: vector slide instructions, Richard Henderson, 2020/03/15
    - Re: [PATCH v5 57/60] target/riscv: vector slide instructions, LIU Zhiwei, 2020/03/15
    - Re: [PATCH v5 57/60] target/riscv: vector slide instructions, Richard Henderson, 2020/03/15
    - Re: [PATCH v5 57/60] target/riscv: vector slide instructions, LIU Zhiwei <=
    - Re: [PATCH v5 57/60] target/riscv: vector slide instructions, Richard Henderson, 2020/03/16
    - Re: [PATCH v5 57/60] target/riscv: vector slide instructions, LIU Zhiwei, 2020/03/24
    - Re: [PATCH v5 57/60] target/riscv: vector slide instructions, Richard Henderson, 2020/03/24
- [PATCH v5 58/60] target/riscv: vector register gather instruction, LIU Zhiwei, 2020/03/12
  - Re: [PATCH v5 58/60] target/riscv: vector register gather instruction, Richard Henderson, 2020/03/15
- [PATCH v5 59/60] target/riscv: vector compress instruction, LIU Zhiwei, 2020/03/12
- [PATCH v5 60/60] target/riscv: configure and turn on vector extension from command line, LIU Zhiwei, 2020/03/12
  - Re: [PATCH v5 60/60] target/riscv: configure and turn on vector extension from command line, Alistair Francis, 2020/03/13
    - Re: [PATCH v5 60/60] target/riscv: configure and turn on vector extension from command line, LIU Zhiwei, 2020/03/13
- Re: [PATCH v5 00/60] target/riscv: support vector extension v0.7.1, no-reply, 2020/03/12

Prev by Date: Re: [PATCH v5 22/26] nvme: support multiple namespaces
Next by Date: Re: [PATCH v6 1/4] qcow2: introduce compression type feature
Previous by thread: Re: [PATCH v5 57/60] target/riscv: vector slide instructions
Next by thread: Re: [PATCH v5 57/60] target/riscv: vector slide instructions
Index(es):
- Date
- Thread