qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v1 1/5] s390x/tcg: Implement VECTOR FIND ANY ELE


From: Richard Henderson
Subject: Re: [Qemu-devel] [PATCH v1 1/5] s390x/tcg: Implement VECTOR FIND ANY ELEMENT EQUAL
Date: Fri, 17 May 2019 09:16:40 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1

On 5/15/19 1:31 PM, David Hildenbrand wrote:
> +#define DEF_VFAE(BITS)                                                       
>   \
> +static int vfae##BITS(void *v1, const void *v2, const void *v3, uint8_t m5)  
>   


First, because this *is* complicated stuff, can we find a way to use inline
functions instead of an undebuggable macro for this?  Perhaps a different set
of wrappers than s390_vec_read_element##BITS, which always return uint32_t, so
that they have a constant signature?

> +        if (zs && !data) {
> +            if (cc == 3) {
> +                first_byte = i * (BITS / 8);
> +                cc = 0; /* match for zero */
> +            } else if (cc != 0) {
> +                cc = 2; /* matching elements before match for zero */
> +            }
> +            if (!rt) {
> +                break;
> +            }
> +        }    

So here we are computing the second intermediate result.

> +        /* try to match with any other element from the other vector */
> +        for (j = 0; j < (128 / BITS); j++) {
> +            if (data == s390_vec_read_element##BITS(v3, j)) {
> +                any_equal = true;
> +                break;
> +            }
> +        }

And here the first intermediate result,

> +        /* invert the result if requested */
> +        any_equal = in ^ any_equal;

... inverted, if requested,

> +        if (cc == 3 && any_equal) {
> +            first_byte = i * (BITS / 8);
> +            cc = 1; /* matching elements, no match for zero */
> +            if (!zs && !rt) {
> +                break;
> +            }
> +        }

> +        /* indicate bit vector if requested */
> +        if (rt && any_equal) {
> +            s390_vec_write_element##BITS(&tmp, i, (uint##BITS##_t)-1ull);
> +        }

... writing out (some of) the results of the first intermediate result.

> +    }
> +    if (!rt) {
> +        s390_vec_write_element8(&tmp, 7, first_byte);
> +    }

... writing out the rest of the first intermediate result.

I wonder if it wouldn't be clearer, within the loop, to do

        if (any_equal) {
            if (cc == 3) {
                first_byte = ...;
                cc = 1;
            }
            if (rt) {
                write element -1;
            } else if (!zs) {
                break;
            }
        }

I also think that, if we create a bunch more of these wrappers:

> +DEF_VFAE_HELPER(8)
> +DEF_VFAE_HELPER(16)
> +DEF_VFAE_HELPER(32)

then RT and ZS can be passed in as constant parameters to the above, and then
the compiler will fold away all of the stuff that's not needed for each
different case.  Which, I think, is significant.  These are practically
different instructions with the different modifiers.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]