qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH for-4.0] include/qemu/bswap.h: Use __builtin_mem


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [PATCH for-4.0] include/qemu/bswap.h: Use __builtin_memcpy() in accessor functions
Date: Mon, 18 Mar 2019 18:07:35 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.1

On 18/03/19 12:29, Peter Maydell wrote:
> In the accessor functions ld*_he_p() and st*_he_p() we use memcpy()
> to perform a load or store to a pointer which might not be aligned
> for the size of the type. We rely on the compiler to optimize this
> memcpy() into an efficient load or store instruction where possible.
> This is required for good performance, but at the moment it is also
> required for correct operation, because some users of these functions
> require that the access is atomic if the pointer is aligned, which
> will only be the case if the compiler has optimized out the memcpy().
> (The particular example where we discovered this is the virtio
> vring_avail_idx() which calls virtio_lduw_phys_cached() which
> eventually ends up calling lduw_he_p().)
> 
> Unfortunately some compile environments, such as the fortify-source
> setup used in Alpine Linux, define memcpy() to a wrapper function
> in a way that inhibits this compiler optimization.
> 
> The correct long-term fix here is to add a set of functions for
> doing atomic accesses into AddressSpaces (and to other relevant
> families of accessor functions like the virtio_*_phys_cached()
> ones), and make sure that callsites which want atomic behaviour
> use the correct functions.
> 
> In the meantime, switch to using __builtin_memcpy() in the
> bswap.h accessor functions. This will make us robust against things
> like this fortify library in the short term. In the longer term
> it will mean that we don't end up with these functions being really
> badly-performing even if the semantics of the out-of-line memcpy()
> are correct.
> 
> Reported-by: Fernando Casas Schössow <address@hidden>
> Signed-off-by: Peter Maydell <address@hidden>
> ---
>  include/qemu/bswap.h | 26 ++++++++++++++++----------
>  1 file changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/include/qemu/bswap.h b/include/qemu/bswap.h
> index 5a70f78c0ba..2a9f3fe783e 100644
> --- a/include/qemu/bswap.h
> +++ b/include/qemu/bswap.h
> @@ -316,51 +316,57 @@ static inline void stb_p(void *ptr, uint8_t v)
>      *(uint8_t *)ptr = v;
>  }
>  
> -/* Any compiler worth its salt will turn these memcpy into native unaligned
> -   operations.  Thus we don't need to play games with packed attributes, or
> -   inline byte-by-byte stores.  */
> +/*
> + * Any compiler worth its salt will turn these memcpy into native unaligned
> + * operations.  Thus we don't need to play games with packed attributes, or
> + * inline byte-by-byte stores.
> + * Some compilation environments (eg some fortify-source implementations)
> + * may intercept memcpy() in a way that defeats the compiler optimization,
> + * though, so we use __builtin_memcpy() to give ourselves the best chance
> + * of good performance.
> + */
>  
>  static inline int lduw_he_p(const void *ptr)
>  {
>      uint16_t r;
> -    memcpy(&r, ptr, sizeof(r));
> +    __builtin_memcpy(&r, ptr, sizeof(r));
>      return r;
>  }
>  
>  static inline int ldsw_he_p(const void *ptr)
>  {
>      int16_t r;
> -    memcpy(&r, ptr, sizeof(r));
> +    __builtin_memcpy(&r, ptr, sizeof(r));
>      return r;
>  }
>  
>  static inline void stw_he_p(void *ptr, uint16_t v)
>  {
> -    memcpy(ptr, &v, sizeof(v));
> +    __builtin_memcpy(ptr, &v, sizeof(v));
>  }
>  
>  static inline int ldl_he_p(const void *ptr)
>  {
>      int32_t r;
> -    memcpy(&r, ptr, sizeof(r));
> +    __builtin_memcpy(&r, ptr, sizeof(r));
>      return r;
>  }
>  
>  static inline void stl_he_p(void *ptr, uint32_t v)
>  {
> -    memcpy(ptr, &v, sizeof(v));
> +    __builtin_memcpy(ptr, &v, sizeof(v));
>  }
>  
>  static inline uint64_t ldq_he_p(const void *ptr)
>  {
>      uint64_t r;
> -    memcpy(&r, ptr, sizeof(r));
> +    __builtin_memcpy(&r, ptr, sizeof(r));
>      return r;
>  }
>  
>  static inline void stq_he_p(void *ptr, uint64_t v)
>  {
> -    memcpy(ptr, &v, sizeof(v));
> +    __builtin_memcpy(ptr, &v, sizeof(v));
>  }
>  
>  static inline int lduw_le_p(const void *ptr)
> 

Queued, thanks.

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]