|
From: | Richard Henderson |
Subject: | Re: [PATCH v4 09/10] util/bufferiszero: Add simd acceleration for aarch64 |
Date: | Thu, 15 Feb 2024 11:10:59 -1000 |
User-agent: | Mozilla Thunderbird |
On 2/15/24 08:46, Alexander Monakov wrote:
Right, so we can pick the cheapest reduction method, and if I'm reading Neoverse-N1 SOG right, SHRN is marginally cheaper than ADDV (latency 2 instead of 3), and it should be generally preferable on other cores, no?
Fair.
For that matter, cannot UQXTN (unsigned saturating extract narrow) be used in place of CMEQ+ADDV here?
Interesting. I hadn't thought about using saturation to preserve non-zeroness like that.Using 1 4-cycle insn instead of 2 2-cycle insns is interesting as well. I suppose, since it's at the end of the dependency chain, the fact that it is restricted to the V1 pipe matters not at all.
r~
[Prev in Thread] | Current Thread | [Next in Thread] |