qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v5 02/35] target/arm: Implement SVE Contiguous L


From: Alex Bennée
Subject: Re: [Qemu-devel] [PATCH v5 02/35] target/arm: Implement SVE Contiguous Load, first-fault and no-fault
Date: Wed, 27 Jun 2018 12:37:30 +0100
User-agent: mu4e 1.1.0; emacs 26.1.50

Richard Henderson <address@hidden> writes:

> On 06/26/2018 05:52 AM, Alex Bennée wrote:
>>> +#define DO_LDFF1(PART, FN, TYPEE, TYPEM, H)                             \
>>> +static void do_sve_ldff1##PART(CPUARMState *env, void *vd, void *vg,    \
>>> +                               target_ulong addr, intptr_t oprsz,       \
>>> +                               bool first, uintptr_t ra)                \
>>> +{                                                                       \
>>> +    intptr_t i = 0;                                                     \
>>> +    do {                                                                \
>>> +        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));                 \
>>> +        do {                                                            \
>>> +            TYPEM m = 0;                                                \
>>> +            if (pg & 1) {                                               \
>>> +                if (!first &&                                           \
>>> +                    page_check_range(addr, sizeof(TYPEM), PAGE_READ)) { \
>>> +                    record_fault(env, i, oprsz);                        \
>>> +                    return;                                             \
>>> +                }                                                       \
>>> +                m = FN(env, addr, ra);                                  \
>>> +                first = false;                                          \
>>> +            }                                                           \
>>> +            *(TYPEE *)(vd + H(i)) = m;                                  \
>>> +            i += sizeof(TYPEE), pg >>= sizeof(TYPEE);                   \
>>> +            addr += sizeof(TYPEM);                                      \
>>> +        } while (i & 15);                                               \
>>> +    } while (i < oprsz);                                                \
>>> +}
>>>  \
>> So I noticed that the disassembly of these two functions is mostly
>> parameter pushing and popping. Is there a case to be made to use the
>> __flatten__ approach and see how the compiler unrolls it all?
>
> Em... for the most part the functions being called are not inlinable,
> being defined in accel/tcg/.

*sigh* I guess. It's a shame because the numbers get more disappointing:

12:13:48 address@hidden:~/l/q/q/aarch64-linux-user] review/rth-sve-v5(+26/-1) + 
./qemu-aarch64 ./tests/simd-memcpy libc intreg intpair simdreg simdpair sve
libc, 248298053, 4228 kb/s
intreg, 646085220, 1623 kb/s
intpair, 369350825, 2841 kb/s
simdreg, 1422096252, 737 kb/s
simdpair, 1369635566, 765 kb/s
sve, 2646179942, 396 kb/s

and the above example doesn't have the cost of page_check_range. I guess
this isn't something that could be improved until other architectures had a
similar predicated load solution we could use in generated code. Helpers
are always going to suck here :-/

Anyway my boy-racer disappointments aside:

Reviewed-by: Alex Bennée <address@hidden>

--
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]