qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v5 11/35] target/arm: Implement SVE scatter stor


From: Richard Henderson
Subject: Re: [Qemu-devel] [PATCH v5 11/35] target/arm: Implement SVE scatter stores
Date: Tue, 26 Jun 2018 07:21:30 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0

On 06/25/2018 09:13 AM, Peter Maydell wrote:
> On 21 June 2018 at 02:53, Richard Henderson
> <address@hidden> wrote:
>> Signed-off-by: Richard Henderson <address@hidden>
>> ---
>>  target/arm/helper-sve.h    | 41 +++++++++++++++++++++
>>  target/arm/sve_helper.c    | 62 ++++++++++++++++++++++++++++++++
>>  target/arm/translate-sve.c | 74 ++++++++++++++++++++++++++++++++++++++
>>  target/arm/sve.decode      | 39 ++++++++++++++++++++
>>  4 files changed, 216 insertions(+)
>> +/* Stores with a vector index.  */
>> +
>> +#define DO_ST1_ZPZ_S(NAME, TYPEI, FN)                                   \
>> +void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, void *vm,       \
>> +                  target_ulong base, uint32_t desc)                     \
>> +{                                                                       \
>> +    intptr_t i, oprsz = simd_oprsz(desc) / 8;                           \
>> +    unsigned scale = simd_data(desc);                                   \
>> +    uintptr_t ra = GETPC();                                             \
>> +    uint32_t *d = vd; TYPEI *m = vm; uint8_t *pg = vg;                  \
>> +    for (i = 0; i < oprsz; i++) {                                       \
>> +        uint8_t pp = pg[H1(i)];                                         \
>> +        if (pp & 0x01) {                                                \
>> +            target_ulong off = (target_ulong)m[H4(i * 2)] << scale;     \
>> +            FN(env, base + off, d[H4(i * 2)], ra);                      \
>> +        }                                                               \
>> +        if (pp & 0x10) {                                                \
>> +            target_ulong off = (target_ulong)m[H4(i * 2 + 1)] << scale; \
>> +            FN(env, base + off, d[H4(i * 2 + 1)], ra);                  \
>> +        }                                                               \
>> +    }                                                                   \
>> +}
> 
> Why do we do two operations per loop here? Generally
> we seem to do one operation per loop elsewhere.

I'm not sure why I wrote this one in this way.
There doesn't seem to be a good reason.


>> +static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a, uint32_t insn)
>> +{
>> +    /* Indexed by [xs][msz].  */
>> +    static gen_helper_gvec_mem_scatter * const fn32[2][3] = {
>> +        { gen_helper_sve_stbs_zsu,
>> +          gen_helper_sve_sths_zsu,
>> +          gen_helper_sve_stss_zsu, },
>> +        { gen_helper_sve_stbs_zss,
>> +          gen_helper_sve_sths_zss,
>> +          gen_helper_sve_stss_zss, },
>> +    };
>> +    static gen_helper_gvec_mem_scatter * const fn64[3][4] = {
> 
> In the pseudocode xs is either 0 (zero-extend offset) or
> 1 (sign-extend offset), but here it can also be 2. A
> comment noting that we've overloaded it to also indicate
> whether we're dealing with a 32-bit or 64-bit offset might
> help (at least I think that's what we're doing).

Yes, xs=2 is overloaded to mean 64-bit offset.
I'll add comments.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]