[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH v5 00/10] Optimize buffer_is_zero
From: |
Richard Henderson |
Subject: |
[PATCH v5 00/10] Optimize buffer_is_zero |
Date: |
Fri, 16 Feb 2024 14:39:08 -1000 |
v3: 20240206204809.9859-1-amonakov@ispras.ru/">https://patchew.org/QEMU/20240206204809.9859-1-amonakov@ispras.ru/
v4:
20240215081449.848220-1-richard.henderson@linaro.org/">https://patchew.org/QEMU/20240215081449.848220-1-richard.henderson@linaro.org/
Changes for v5:
- Move 3 byte sample back inline; document it.
- Drop AArch64 SVE alternative; neoverse-v2 still recommends simd for memcpy.
- Use UMAXV for aarch64 simd reduction
3 cycles on cortex-a76, 2 cycles on neoverse-n1,
as compared to UQXTN or CMEQ+SHRN at 4 cycles each.
- Add benchmark of zeros.
The benchmark is trivial, and could be improved so that it
prints the name of the acceleration routine instead of its
index in the selection process. But its is good enough to
see that #0 is faster than #1, etc.
A sample set:
Apple M1:
buffer_is_zero #0: 135416.27 MB/sec
buffer_is_zero #1: 111771.25 MB/sec
Neoverse N1:
buffer_is_zero #0: 56489.82 MB/sec
buffer_is_zero #1: 36347.93 MB/sec
i7-1195G7:
buffer_is_zero #0: 137327.40 MB/sec
buffer_is_zero #1: 69159.20 MB/sec
buffer_is_zero #2: 38319.80 MB/sec
r~
Alexander Monakov (5):
util/bufferiszero: Remove SSE4.1 variant
util/bufferiszero: Remove AVX512 variant
util/bufferiszero: Reorganize for early test for acceleration
util/bufferiszero: Remove useless prefetches
util/bufferiszero: Optimize SSE2 and AVX2 variants
Richard Henderson (5):
util/bufferiszero: Improve scalar variant
util/bufferiszero: Introduce biz_accel_fn typedef
util/bufferiszero: Simplify test_buffer_is_zero_next_accel
util/bufferiszero: Add simd acceleration for aarch64
tests/bench: Add bufferiszero-bench
include/qemu/cutils.h | 32 ++-
tests/bench/bufferiszero-bench.c | 42 +++
util/bufferiszero.c | 449 +++++++++++++++++--------------
tests/bench/meson.build | 4 +-
4 files changed, 319 insertions(+), 208 deletions(-)
create mode 100644 tests/bench/bufferiszero-bench.c
--
2.34.1
- [PATCH v5 00/10] Optimize buffer_is_zero,
Richard Henderson <=
- [PATCH v5 01/10] util/bufferiszero: Remove SSE4.1 variant, Richard Henderson, 2024/02/16
- [PATCH v5 02/10] util/bufferiszero: Remove AVX512 variant, Richard Henderson, 2024/02/16
- [PATCH v5 03/10] util/bufferiszero: Reorganize for early test for acceleration, Richard Henderson, 2024/02/16
- [PATCH v5 04/10] util/bufferiszero: Remove useless prefetches, Richard Henderson, 2024/02/16
- [PATCH v5 05/10] util/bufferiszero: Optimize SSE2 and AVX2 variants, Richard Henderson, 2024/02/16
- [PATCH v5 06/10] util/bufferiszero: Improve scalar variant, Richard Henderson, 2024/02/16
- [PATCH v5 07/10] util/bufferiszero: Introduce biz_accel_fn typedef, Richard Henderson, 2024/02/16
- [PATCH v5 08/10] util/bufferiszero: Simplify test_buffer_is_zero_next_accel, Richard Henderson, 2024/02/16