qemu-s390x
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [qemu-s390x] [Qemu-devel] [PATCH v1 1/5] s390x/tcg: Implement VECTOR


From: David Hildenbrand
Subject: Re: [qemu-s390x] [Qemu-devel] [PATCH v1 1/5] s390x/tcg: Implement VECTOR FIND ANY ELEMENT EQUAL
Date: Mon, 20 May 2019 11:51:12 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1

On 17.05.19 18:16, Richard Henderson wrote:
> On 5/15/19 1:31 PM, David Hildenbrand wrote:
>> +#define DEF_VFAE(BITS)                                                      
>>    \
>> +static int vfae##BITS(void *v1, const void *v2, const void *v3, uint8_t m5) 
>>    
> 
> 
> First, because this *is* complicated stuff, can we find a way to use inline
> functions instead of an undebuggable macro for this?  Perhaps a different set
> of wrappers than s390_vec_read_element##BITS, which always return uint32_t, so
> that they have a constant signature?

For vfene I have for now

+static inline uint64_t s390_vec_read_element(const S390Vector *v,
uint8_t enr,
+                                             uint8_t es)
+{
+    switch (es) {
+    case MO_8:
+        return s390_vec_read_element8(v, enr);
+    case MO_16:
+        return s390_vec_read_element16(v, enr);
+    case MO_32:
+        return s390_vec_read_element32(v, enr);
+    case MO_64:
+        return s390_vec_read_element64(v, enr);
+    default:
+        g_assert_not_reached();
+    }
+}
+

Which we could reuse here.

I'll try to look into using a inline function instead, passing in the
element size and other flags, so the compiler can specialize.

Thanks!

> 
>> +        if (zs && !data) {
>> +            if (cc == 3) {
>> +                first_byte = i * (BITS / 8);
>> +                cc = 0; /* match for zero */
>> +            } else if (cc != 0) {
>> +                cc = 2; /* matching elements before match for zero */
>> +            }
>> +            if (!rt) {
>> +                break;
>> +            }
>> +        }    
> 
> So here we are computing the second intermediate result.
> 
>> +        /* try to match with any other element from the other vector */
>> +        for (j = 0; j < (128 / BITS); j++) {
>> +            if (data == s390_vec_read_element##BITS(v3, j)) {
>> +                any_equal = true;
>> +                break;
>> +            }
>> +        }
> 
> And here the first intermediate result,
> 
>> +        /* invert the result if requested */
>> +        any_equal = in ^ any_equal;
> 
> ... inverted, if requested,
> 
>> +        if (cc == 3 && any_equal) {
>> +            first_byte = i * (BITS / 8);
>> +            cc = 1; /* matching elements, no match for zero */
>> +            if (!zs && !rt) {
>> +                break;
>> +            }
>> +        }
> 
>> +        /* indicate bit vector if requested */
>> +        if (rt && any_equal) {
>> +            s390_vec_write_element##BITS(&tmp, i, (uint##BITS##_t)-1ull);
>> +        }
> 
> ... writing out (some of) the results of the first intermediate result.
> 
>> +    }
>> +    if (!rt) {
>> +        s390_vec_write_element8(&tmp, 7, first_byte);
>> +    }
> 
> ... writing out the rest of the first intermediate result.
> 
> I wonder if it wouldn't be clearer, within the loop, to do
> 
>       if (any_equal) {
>           if (cc == 3) {
>               first_byte = ...;
>               cc = 1;
>           }
>           if (rt) {
>               write element -1;
>           } else if (!zs) {
>               break;
>           }
>       }
> 
> I also think that, if we create a bunch more of these wrappers:
> 
>> +DEF_VFAE_HELPER(8)
>> +DEF_VFAE_HELPER(16)
>> +DEF_VFAE_HELPER(32)
> 
> then RT and ZS can be passed in as constant parameters to the above, and then
> the compiler will fold away all of the stuff that's not needed for each
> different case.  Which, I think, is significant.  These are practically
> different instructions with the different modifiers.
> 
> 
> r~
> 


-- 

Thanks,

David / dhildenb



reply via email to

[Prev in Thread] Current Thread [Next in Thread]