qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH v5 00/10] Optimize buffer_is_zero


From: Richard Henderson
Subject: [PATCH v5 00/10] Optimize buffer_is_zero
Date: Fri, 16 Feb 2024 14:39:08 -1000

v3: 20240206204809.9859-1-amonakov@ispras.ru/">https://patchew.org/QEMU/20240206204809.9859-1-amonakov@ispras.ru/
v4: 
20240215081449.848220-1-richard.henderson@linaro.org/">https://patchew.org/QEMU/20240215081449.848220-1-richard.henderson@linaro.org/

Changes for v5:
  - Move 3 byte sample back inline; document it.
  - Drop AArch64 SVE alternative; neoverse-v2 still recommends simd for memcpy.
  - Use UMAXV for aarch64 simd reduction
    3 cycles on cortex-a76, 2 cycles on neoverse-n1,
    as compared to UQXTN or CMEQ+SHRN at 4 cycles each.
  - Add benchmark of zeros.

The benchmark is trivial, and could be improved so that it
prints the name of the acceleration routine instead of its
index in the selection process.  But its is good enough to
see that #0 is faster than #1, etc.

A sample set:

Apple M1:
  buffer_is_zero #0: 135416.27 MB/sec
  buffer_is_zero #1: 111771.25 MB/sec

Neoverse N1:
  buffer_is_zero #0: 56489.82 MB/sec
  buffer_is_zero #1: 36347.93 MB/sec

i7-1195G7:
  buffer_is_zero #0: 137327.40 MB/sec
  buffer_is_zero #1: 69159.20 MB/sec
  buffer_is_zero #2: 38319.80 MB/sec


r~


Alexander Monakov (5):
  util/bufferiszero: Remove SSE4.1 variant
  util/bufferiszero: Remove AVX512 variant
  util/bufferiszero: Reorganize for early test for acceleration
  util/bufferiszero: Remove useless prefetches
  util/bufferiszero: Optimize SSE2 and AVX2 variants

Richard Henderson (5):
  util/bufferiszero: Improve scalar variant
  util/bufferiszero: Introduce biz_accel_fn typedef
  util/bufferiszero: Simplify test_buffer_is_zero_next_accel
  util/bufferiszero: Add simd acceleration for aarch64
  tests/bench: Add bufferiszero-bench

 include/qemu/cutils.h            |  32 ++-
 tests/bench/bufferiszero-bench.c |  42 +++
 util/bufferiszero.c              | 449 +++++++++++++++++--------------
 tests/bench/meson.build          |   4 +-
 4 files changed, 319 insertions(+), 208 deletions(-)
 create mode 100644 tests/bench/bufferiszero-bench.c

-- 
2.34.1




reply via email to

[Prev in Thread] Current Thread [Next in Thread]