[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v5 33/35] target/arm: Implement SVE dot product
From: |
Peter Maydell |
Subject: |
Re: [Qemu-devel] [PATCH v5 33/35] target/arm: Implement SVE dot product (indexed) |
Date: |
Tue, 26 Jun 2018 17:30:02 +0100 |
On 26 June 2018 at 17:17, Richard Henderson
<address@hidden> wrote:
> On 06/26/2018 08:30 AM, Peter Maydell wrote:
>> On 21 June 2018 at 02:53, Richard Henderson
>> <address@hidden> wrote:
>>> Signed-off-by: Richard Henderson <address@hidden>
>>> ---
>>> target/arm/helper.h | 5 ++
>>> target/arm/translate-sve.c | 18 +++++++
>>> target/arm/vec_helper.c | 96 ++++++++++++++++++++++++++++++++++++++
>>> target/arm/sve.decode | 8 +++-
>>> 4 files changed, 126 insertions(+), 1 deletion(-)
>>>
>>
>>> +void HELPER(gvec_sdot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
>>> +{
>>> + intptr_t i, j, opr_sz = simd_oprsz(desc), opr_sz_4 = opr_sz / 4;
>>> + intptr_t index = simd_data(desc);
>>> + uint32_t *d = vd;
>>> + int8_t *n = vn, *m = vm;
>>> +
>>> + for (i = 0; i < opr_sz_4; i = j) {
>>> + int8_t m0 = m[(i + index) * 4 + 0];
>>> + int8_t m1 = m[(i + index) * 4 + 1];
>>> + int8_t m2 = m[(i + index) * 4 + 2];
>>> + int8_t m3 = m[(i + index) * 4 + 3];
>>> +
>>> + j = i;
>>> + do {
>>> + d[j] += n[j * 4 + 0] * m0
>>> + + n[j * 4 + 1] * m1
>>> + + n[j * 4 + 2] * m2
>>> + + n[j * 4 + 3] * m3;
>>> + } while (++j < MIN(i + 4, opr_sz_4));
>>> + }
>>> + clear_tail(d, opr_sz, simd_maxsz(desc));
>>> +}
>>
>> Maybe I'm just half asleep this afternoon, but this is pretty
>> confusing -- nested loops where the outer loop's increment
>> uses the inner loop's index, and the inner loop's conditions
>> depend on the outer loop index...
>
> Yeah, well.
>
> There is an edge case of aa64 advsimd, reusing this same helper,
>
> sdot v0.2s, v1.8b, v0.4b[0]
>
> where m values must be read (and held) before writing d results,
> and there are not 16/4=4 elements to process but only 2.
>
> I suppose I could special-case oprsz == 8 in order to simplify
> iteration of what is otherwise a multiple of 16.
>
> I thought iterating J from I to I+4 was easier to read than
> writing out I+J everywhere. Perhaps not.
Mmm. I did indeed fail to notice the symmetry between the
indexes into m[] and those into n[].
The other bit that threw me is where the outer loop on i
updates using j.
A comment describing the intent might assist ?
thanks
-- PMM
[Qemu-devel] [PATCH v5 34/35] target/arm: Enable SVE for aarch64-linux-user, Richard Henderson, 2018/06/20
[Qemu-devel] [PATCH v5 32/35] target/arm: Implement SVE dot product (vectors), Richard Henderson, 2018/06/20
[Qemu-devel] [PATCH v5 31/35] target/arm: Implement SVE fp complex multiply add (indexed), Richard Henderson, 2018/06/20
[Qemu-devel] [PATCH v5 35/35] target/arm: Implement ARMv8.2-DotProd, Richard Henderson, 2018/06/20
Re: [Qemu-devel] [PATCH v5 00/35] target/arm SVE patches, no-reply, 2018/06/21
Re: [Qemu-devel] [PATCH v5 00/35] target/arm SVE patches, Alex Bennée, 2018/06/26