[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] target-tilegx: Implement v*add and v*sub instru
From: |
Chen Gang |
Subject: |
Re: [Qemu-devel] [PATCH] target-tilegx: Implement v*add and v*sub instructions |
Date: |
Mon, 21 Sep 2015 06:37:57 +0800 |
User-agent: |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 |
On 9/19/15 10:34, Richard Henderson wrote:
> On 09/18/2015 05:03 PM, address@hidden wrote:
>> +uint64_t helper_v1add(uint64_t a, uint64_t b)
>> +{
>> + uint64_t r = 0;
>> + int i;
>> +
>> + for (i = 0; i < 64; i += 8) {
>> + int64_t ae = (int8_t)(a >> i);
>> + int64_t be = (int8_t)(b >> i);
>> + r |= ((ae + be) & 0xff) << i;
>> + }
>> + return r;
>> +}
>> +
>> +uint64_t helper_v2add(uint64_t a, uint64_t b)
>> +{
>> + uint64_t r = 0;
>> + int i;
>> +
>> + for (i = 0; i < 64; i += 16) {
>> + int64_t ae = (int16_t)(a >> i);
>> + int64_t be = (int16_t)(b >> i);
>> + r |= ((ae + be) & 0xffff) << i;
>> + }
>> + return r;
>> +}
>
> There's a trick for this that's more efficient for 4 or more elements per
> vector (i.e. good for v2 and v1, but not v4):
>
> a + b = (a & 0x7f7f7f7f) + (b & 0x7f7f7f7f)) ^ ((a ^ b) & 0x80808080)
>
> a - b = (a | 0x80808080) - (b & 0x7f7f7f7f)) ^ ((a ^ ~b) & 0x80808080)
>
OK, thanks, for me, it is a good idea. :-)
>> +uint64_t helper_v4add(uint64_t a, uint64_t b)
>> +{
>> + uint64_t r = 0;
>> + int i;
>> +
>> + for (i = 0; i < 64; i += 32) {
>> + int64_t ae = (int32_t)(a >> i);
>> + int64_t be = (int32_t)(b >> i);
>> + r |= ((ae + be) & 0xffffffff) << i;
>> + }
>> + return r;
>> +}
>
> I should have mentioned this in the previous patch...
>
mm... maybe, but at least, I forgot.
> I think probably it would be best to open-code all, or most of, the v4
> operations. Something like
>
> static void gen_v4op(TCGv d64, TCGv a64, TCGv b64,
> void (*generate)(TCGv_i32, TCGv_i32, TCGv_i32))
> {
> TCGv_i32 al = tcg_temp_new_i32();
> TCGv_i32 ah = tcg_temp_new_i32();
> TCGv_i32 bl = tcg_temp_new_i32();
> TCGv_i32 bh = tcg_temp_new_i32();
>
> tcg_gen_extr_i64_i32(al, ah, a64);
> tcg_gen_extr_i64_i32(bl, bh, b64);
> generate(al, al, bl);
> generate(ah, ah, bh);
> tcg_gen_concat_i32_i64(d64, al, ah);
>
> tcg_temp_free_i32(al);
> tcg_temp_free_i32(ah);
> tcg_temp_free_i32(bl);
> tcg_temp_free_i32(bh);
> }
>
>> case OE_RRR(V4ADD, 0, X0):
>> case OE_RRR(V4ADD, 0, X1):
>> - return TILEGX_EXCP_OPCODE_UNIMPLEMENTED;
>> + gen_helper_v4add(tdest, tsrca, tsrcb);
>
> And then
>
> gen_v4op(tdest, tsrca, tsrcb, tcg_gen_add_i32);
>
OK, thanks. At least for me, what you said sounds reasonalbe.
Thanks.
--
Chen Gang (陈刚)
Open, share, and attitude like air, water, and life which God blessed