qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 10/10] cutils: Rewrite x86 buffer zero checking


From: Richard Henderson
Subject: Re: [Qemu-devel] [PATCH 10/10] cutils: Rewrite x86 buffer zero checking
Date: Tue, 13 Sep 2016 09:27:02 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0

On 09/13/2016 09:10 AM, Paolo Bonzini wrote:
> @@ -177,16 +231,15 @@ bool test_buffer_is_zero_next_accel(void)
>  
>  static bool select_accel_fn(const void *buf, size_t len)
>  {
> -    uintptr_t ibuf = (uintptr_t)buf;
>  #ifdef CONFIG_AVX2_OPT
> -    if (len % 128 == 0 && ibuf % 32 == 0 && (cpuid_cache & CACHE_AVX2)) {
> +    if (len >= 128 && (cpuid_cache & CACHE_AVX2)) {
>          return buffer_zero_avx2(buf, len);
>      }
> -    if (len % 64 == 0 && ibuf % 16 == 0 && (cpuid_cache & CACHE_SSE4)) {
> +    if (len >= 64 && (cpuid_cache & CACHE_SSE4)) {
>          return buffer_zero_sse4(buf, len);
>      }
>  #endif
> -    if (len % 64 == 0 && ibuf % 16 == 0 && (cpuid_cache & CACHE_SSE2)) {
> +    if (len >= 64 && (cpuid_cache & CACHE_SSE2)) {
>          return buffer_zero_sse2(buf, len);
>      }

You've dropped a major change to select_accel_fn here.

(1) The avx2 routine, as written, can support len >= 64, therefore a common
test works for all of the vectorized functions.

(2) I had saved the pointer to the routine, so that we didn't have to
repeatedly test multiple cpuid_cache bits.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]