Re: [PATCH v4 00/10] Optimize buffer_is

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 00/10] Optimize buffer_is_zero

From:	Richard Henderson
Subject:	Re: [PATCH v4 00/10] Optimize buffer_is_zero
Date:	Thu, 15 Feb 2024 11:16:53 -1000
User-agent:	Mozilla Thunderbird

On 2/14/24 22:57, Alexander Monakov wrote:


On Wed, 14 Feb 2024, Richard Henderson wrote:

v3: 20240206204809.9859-1-amonakov@ispras.ru/">https://patchew.org/QEMU/20240206204809.9859-1-amonakov@ispras.ru/

Changes for v4:
   - Keep separate >= 256 entry point, but only keep constant length
     check inline.  This allows the indirect function call to be hidden
     and optimized away when the pointer is constant.


Sorry, I don't understand this. Most of the improvement (at least in our
testing) comes from inlining the byte checks, which often fail and eliminate
call overhead entirely. Moving them out-of-line seems to lose most of the
speedup the patchset was bringing, doesn't it? Is there some concern I am
not seeing?


What is your benchmarking method?

It was my guess that most of the improvement came from performing those early byte checks*at all*, and that the overhead of a function call to a small out of line wrapper would benegligible.

By not exposing the function pointer outside the bufferiszero translation unit, thecompiler can see when the pointer is never modified for a given host, and then transformthe indirect branch to a direct branch.

r~

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH v4 05/10] util/bufferiszero: Optimize SSE2 and AVX2 variants, (continued)
- [PATCH v4 05/10] util/bufferiszero: Optimize SSE2 and AVX2 variants, Richard Henderson, 2024/02/15
- [PATCH v4 06/10] util/bufferiszero: Improve scalar variant, Richard Henderson, 2024/02/15
- [PATCH v4 07/10] util/bufferiszero: Introduce biz_accel_fn typedef, Richard Henderson, 2024/02/15
  - Re: [PATCH v4 07/10] util/bufferiszero: Introduce biz_accel_fn typedef, Philippe Mathieu-Daudé, 2024/02/15
- [PATCH v4 08/10] util/bufferiszero: Simplify test_buffer_is_zero_next_accel, Richard Henderson, 2024/02/15
  - Re: [PATCH v4 08/10] util/bufferiszero: Simplify test_buffer_is_zero_next_accel, Philippe Mathieu-Daudé, 2024/02/15
- [RFC PATCH v4 10/10] util/bufferiszero: Add sve acceleration for aarch64, Richard Henderson, 2024/02/15
  - Re: [RFC PATCH v4 10/10] util/bufferiszero: Add sve acceleration for aarch64, Alex Bennée, 2024/02/16
  - Re: [RFC PATCH v4 10/10] util/bufferiszero: Add sve acceleration for aarch64, Alex Bennée, 2024/02/16
- Re: [PATCH v4 00/10] Optimize buffer_is_zero, Alexander Monakov, 2024/02/15
  - Re: [PATCH v4 00/10] Optimize buffer_is_zero, Richard Henderson <=
    - Re: [PATCH v4 00/10] Optimize buffer_is_zero, Alexander Monakov, 2024/02/15
    - Re: [PATCH v4 00/10] Optimize buffer_is_zero, Richard Henderson, 2024/02/15
    - Re: [PATCH v4 00/10] Optimize buffer_is_zero, Alexander Monakov, 2024/02/15
    - Re: [PATCH v4 00/10] Optimize buffer_is_zero, Richard Henderson, 2024/02/16
    - Re: [PATCH v4 00/10] Optimize buffer_is_zero, Alexander Monakov, 2024/02/16
    - Re: [PATCH v4 00/10] Optimize buffer_is_zero, Richard Henderson, 2024/02/16

Prev by Date: Re: [RFC PATCH 3/6] target/riscv: Inline vext_ldst_us and coressponding function for performance
Next by Date: Re: [PATCH v3 0/6] riscv: named features riscv,isa, 'svade' rework
Previous by thread: Re: [PATCH v4 00/10] Optimize buffer_is_zero
Next by thread: Re: [PATCH v4 00/10] Optimize buffer_is_zero
Index(es):
- Date
- Thread