qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-arm] [Qemu-devel] [PATCH v3 1/1] target-arm: Use Neon for zero


From: Richard Henderson
Subject: Re: [Qemu-arm] [Qemu-devel] [PATCH v3 1/1] target-arm: Use Neon for zero checking
Date: Fri, 1 Jul 2016 15:07:50 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1

On 06/30/2016 06:45 AM, Peter Maydell wrote:
On 29 June 2016 at 09:47,  <address@hidden> wrote:
From: Vijay <address@hidden>

Use Neon instructions to perform zero checking of
buffer. This is helps in reducing total migration time.

diff --git a/util/cutils.c b/util/cutils.c
index 5830a68..4779403 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -184,6 +184,13 @@ int qemu_fdatasync(int fd)
 #define SPLAT(p)       _mm_set1_epi8(*(p))
 #define ALL_EQ(v1, v2) (_mm_movemask_epi8(_mm_cmpeq_epi8(v1, v2)) == 0xFFFF)
 #define VEC_OR(v1, v2) (_mm_or_si128(v1, v2))
+#elif __aarch64__
+#include "arm_neon.h"
+#define VECTYPE        uint64x2_t
+#define ALL_EQ(v1, v2) \
+        ((vgetq_lane_u64(v1, 0) == vgetq_lane_u64(v2, 0)) && \
+         (vgetq_lane_u64(v1, 1) == vgetq_lane_u64(v2, 1)))
+#define VEC_OR(v1, v2) ((v1) | (v2))

Should be '#elif defined(__aarch64__)'. I have made this
tweak and put this patch in target-arm.next.

Consider

#define VECTYPE        uint32x4_t
#define ALL_EQ(v1, v2) (vmaxvq_u32((v1) ^ (v2)) == 0)


which compiles down to

  1c:   6e211c00        eor     v0.16b, v0.16b, v1.16b
  20:   6eb0a800        umaxv   s0, v0.4s
  24:   1e260000        fmov    w0, s0
  28:   6b1f001f        cmp     w0, wzr
  2c:   1a9f17e0        cset    w0, eq
  30:   d65f03c0        ret

vs

  34:   4e083c20        mov     x0, v1.d[0]
  38:   4e083c01        mov     x1, v0.d[0]
  3c:   eb00003f        cmp     x1, x0
  40:   52800000        mov     w0, #0
  44:   54000040        b.eq    4c <f0+0x18>
  48:   d65f03c0        ret
  4c:   4e183c20        mov     x0, v1.d[1]
  50:   4e183c01        mov     x1, v0.d[1]
  54:   eb00003f        cmp     x1, x0
  58:   1a9f17e0        cset    w0, eq
  5c:   d65f03c0        ret


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]