[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-arm] [PATCH v2 09/11] target/arm: Decode aa64 armv8.3 fcmla
From: |
Peter Maydell |
Subject: |
Re: [Qemu-arm] [PATCH v2 09/11] target/arm: Decode aa64 armv8.3 fcmla |
Date: |
Fri, 26 Jan 2018 10:07:14 +0000 |
On 26 January 2018 at 07:29, Richard Henderson
<address@hidden> wrote:
> On 01/15/2018 10:18 AM, Peter Maydell wrote:
>>> +void HELPER(gvec_fcmlah)(void *vd, void *vn, void *vm,
>>> + void *vfpst, uint32_t desc)
>>> +{
>>> + uintptr_t opr_sz = simd_oprsz(desc);
>>> + float16 *d = vd;
>>> + float16 *n = vn;
>>> + float16 *m = vm;
>>> + float_status *fpst = vfpst;
>>> + intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
>>> + uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
>>> + uint32_t neg_real = flip ^ neg_imag;
>>> + uintptr_t i;
>>> +
>>> + neg_real <<= 15;
>>> + neg_imag <<= 15;
>>> +
>>> + for (i = 0; i < opr_sz / 2; i += 2) {
>>> + float16 e0 = n[H2(i + flip)];
>>> + float16 e1 = m[H2(i + flip)] ^ neg_real;
>>> + float16 e2 = e0;
>>> + float16 e3 = m[H2(i + 1 - flip)] ^ neg_imag;
>>
>> This is again rather confusing to compare against the pseudocode.
>> What order are your e0/e1/e2/e3 compared to the pseudocode's
>> element1/element2/element3/element4 ?
>
> The SVE pseudocode for the same operation is clearer than that in the main ARM
> ARM, and is nearer to what I used:
>
> for e = 0 to elements-1
> if ElemP[mask, e, esize] == '1' then
> pair = e - (e MOD 2); // index of first element in pair
> addend = Elem[result, e, esize];
> if IsEven(e) then // real part
> // realD = realA [+-] flip ? (imagN * imagM) : (realN * realM)
> element1 = Elem[operand1, pair + flip, esize];
> element2 = Elem[operand2, pair + flip, esize];
> if neg_real then element2 = FPNeg(element2);
> else // imaginary part
> // imagD = imagA [+-] flip ? (imagN * realM) : (realN * imagM)
> element1 = Elem[operand1, pair + flip, esize];
> element2 = Elem[operand2, pair + (1 - flip), esize];
> if neg_imag then element2 = FPNeg(element2);
> Elem[result, e, esize] = FPMulAdd(addend, element1, element2, FPCR);
>
> In my version, e0/e1 are element1/element2 (real) and e2/e3 are
> element1/element2 (imag).
Thanks. Could we use the same indexing (1/2/3/4) as the final Arm ARM
pseudocode?
thanks
-- PMM