qemu-s390x
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [qemu-s390x] [Qemu-devel] [PATCH v1 06/33] s390x/tcg: Implement VECT


From: David Hildenbrand
Subject: Re: [qemu-s390x] [Qemu-devel] [PATCH v1 06/33] s390x/tcg: Implement VECTOR GENERATE BYTE MASK
Date: Tue, 26 Feb 2019 20:23:18 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0

On 26.02.19 20:12, Richard Henderson wrote:
> On 2/26/19 3:38 AM, David Hildenbrand wrote:
>> +static DisasJumpType op_vgbm(DisasContext *s, DisasOps *o)
>> +{
>> +    const uint16_t i2 = get_field(s->fields, i2);
>> +    TCGv_i32 ones = tcg_const_i32(-1u);
>> +    TCGv_i32 zeroes = tcg_const_i32(0);
>> +    int i;
>> +
>> +    for (i = 0; i < 16; i++) {
>> +        if (extract32(i2, 15 - i, 1)) {
>> +            write_vec_element_i32(ones, get_field(s->fields, v1), i, MO_8);
>> +        } else {
>> +            write_vec_element_i32(zeroes, get_field(s->fields, v1), i, 
>> MO_8);
>> +        }
>> +    }
>> +    tcg_temp_free_i32(ones);
>> +    tcg_temp_free_i32(zeroes);
>> +    return DISAS_NEXT;
>> +}
> 
> While this works, it's not in the spirit of
> 
>> Programming Note: VECTOR GENERATE BYTE
>> MASK is the preferred method for setting a vector
>> register to all zeroes or ones.

Good point, I skipped that note so far.

> 
> Better, I think, with

Many instructions to implement, so little time to fine tune stuff so
far. However I have tests for VGBM, so I can easily get it working. Will
play with it!

> 
> uint64_t generate_byte_mask(uint8_t mask)
> {
>     uint64_t r = 0;
>     int i;
>     for (i = 0; i < 8; i++) {
>         if ((mask >> i) & 1) {
>             r |= 0xffull << (i * 8);
>         }
>     }
>     return r;
> }
> 
>     if (i2 == (i2 & 0xff) * 0x0101) {
>         /* masks for both halves of the vector are the same.
>            trust tcg to produce a good constant loading.  */
>         tcg_gen_gvec_dup64i(vec_full_reg_offset(s, v1), 16, 16,
>                             generate_byte_mask(i2 & 0xff));
>     } else {
>         TCGv_i64 t = tcg_temp_new_i64();
>         tcg_gen_movi_i64(t, generate_byte_mask(i2 >> 8));
>         write_vec_element_i64(t, v1, 0, MO_64);
>         tcg_gen_movi_i64(t, generate_byte_mask(i2 & 0xff));
>         write_vec_element_i64(t, v1, 1, MO_64);
>         tcg_temp_free_i64();
>     }
> 
> Somewhere behind tcg_gen_gvec_dup64i, I check to see if the constant can be
> decomposed further, which will eventually bottom out at
> 
>       vpxor   %xmm0,%xmm0,%xmm0               // all zeros
>       vpcmpeq %xmm0,%xmm0,%xmm0               // all ones
> 
> and even more interesting combinations for tcg/aarch64.
> 
> 

At this point I want to highlight how helpful your reviews are. Amazing! :)

> 
> r~
> 


-- 

Thanks,

David / dhildenb



reply via email to

[Prev in Thread] Current Thread [Next in Thread]