[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v1 03/20] s390x/tcg: Implement VECTOR MULTIPLY SUM LOGICAL
From: |
Richard Henderson |
Subject: |
Re: [PATCH v1 03/20] s390x/tcg: Implement VECTOR MULTIPLY SUM LOGICAL |
Date: |
Thu, 1 Oct 2020 10:26:36 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 |
On 9/30/20 9:55 AM, David Hildenbrand wrote:
> + /* Multipy both even elements from v2 and v3 */
> + read_vec_element_i64(l1, get_field(s, v2), 0, ES_64);
> + read_vec_element_i64(h1, get_field(s, v3), 0, ES_64);
> + tcg_gen_mulu2_i64(l1, h1, l1, h1);
> + /* Shift result left by one bit if requested */
> + if (extract32(get_field(s, m6), 3, 1)) {
> + tcg_gen_extract2_i64(h1, l1, h1, 63);
> + tcg_gen_shli_i64(l1, l1, 1);
> + }
Not a bug, but some hosts require 3 insns for extract2 (so 4 total for this
sequence).
This doubling can also be had via add2:
tcg_gen_add2_i64(l1, h1, l1, h1, l1, h1);
At which point most hosts will require only 2 insns for this sequence. The two
hosts that don't have a carry bit (mips, riscv), will still be able to perform
the add in 3 insns.
So add is never more expensive and sometimes half as expensive.
Regardless,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
- Re: [PATCH v1 03/20] s390x/tcg: Implement VECTOR MULTIPLY SUM LOGICAL,
Richard Henderson <=