[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v2] atomic: using memory_order_relaxed for refcn
From: |
Paolo Bonzini |
Subject: |
Re: [Qemu-devel] [PATCH v2] atomic: using memory_order_relaxed for refcnt inc/dec ops |
Date: |
Sun, 14 Jul 2013 07:57:54 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130514 Thunderbird/17.0.6 |
Il 14/07/2013 04:53, Liu Ping Fan ha scritto:
> Refcnt's atomic inc/dec ops are frequent and its idiom need no seq_cst
> order. So to get better performance, it worth to adopt _relaxed
> other than _seq_cst memory model on them.
>
> We resort to gcc builtins. If gcc supports C11 memory model, __atomic_*
> buitlins is used, otherwise __sync_* builtins.
>
> Signed-off-by: Liu Ping Fan <address@hidden>
No, not at all. :(
First of all, I'd like to understand how you benchmarked this. For
inc/dec, relaxed vs. seq_cst has no effect except on PPC and ARM. And
if the refcount ops are frequent enough, I strongly suspect cacheline
bouncing has a bigger effect than the memory barriers.
Second, it is wrong because you need a further read memory barrier when
you are removing the last reference
Third, it is making the API completely unorthogonal, and "tend to be
exceptions" is not a justification.
The justification here could be, rather than the performance, having to
remember how to use atomic_fetch_dec in the unref side. I don't really
buy that, but if you really care, do something like
#define atomic_ref(ptr, field) \
__atomic_fetch_add(&((ptr)->field), 1, __ATOMIC_RELAXED)
#define atomic_unref(ptr, field, releasefn) ( \
__atomic_fetch_add(&((ptr)->field), -1, __ATOMIC_RELEASE) == 1 \
? (__atomic_thread_fence(__ATOMIC_ACQUIRE), (releasefn)(ptr)) : false)
i.e. define a new interface similar to kref_get/kref_put and, since you
are at it, optimize it.
Paolo
> ---
> v2:
> update atomics.txt
> fix the dependency MACRO
> ---
> docs/atomics.txt | 4 ++++
> include/qemu/atomic.h | 14 ++++++++++++++
> 2 files changed, 18 insertions(+)
>
> diff --git a/docs/atomics.txt b/docs/atomics.txt
> index 6f2997b..c8ad31d 100644
> --- a/docs/atomics.txt
> +++ b/docs/atomics.txt
> @@ -45,6 +45,10 @@ consistency*, where "the result of any execution is the
> same as if the
> operations of all the processors were executed in some sequential order,
> and the operations of each individual processor appear in this sequence
> in the order specified by its program".
> +Note that atomic_inc/_dec and atomic_fetch_inc/_dec tend to be exceptions.
> +Once gcc provides atomic_* builtins, they adopt *relaxed* memory order.
> +The reason is that these pairs are used by the refcnt's ops which is frequent
> +and has more effect on performance.
>
> qemu/atomic.h provides the following set of atomic read-modify-write
> operations:
> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
> index 0aa8913..e8353d2 100644
> --- a/include/qemu/atomic.h
> +++ b/include/qemu/atomic.h
> @@ -183,8 +183,15 @@
> #endif
>
> /* Provide shorter names for GCC atomic builtins. */
> +#ifdef __ATOMIC_RELAXED
> +/* C11 memory_order_relaxed */
> +#define atomic_fetch_inc(ptr) __atomic_fetch_add(ptr, 1, __ATOMIC_RELAXED)
> +#define atomic_fetch_dec(ptr) __atomic_fetch_add(ptr, -1, __ATOMIC_RELAXED)
> +#else
> +/* close to C11 memory_order_seq_cst */
> #define atomic_fetch_inc(ptr) __sync_fetch_and_add(ptr, 1)
> #define atomic_fetch_dec(ptr) __sync_fetch_and_add(ptr, -1)
> +#endif
> #define atomic_fetch_add __sync_fetch_and_add
> #define atomic_fetch_sub __sync_fetch_and_sub
> #define atomic_fetch_and __sync_fetch_and_and
> @@ -192,8 +199,15 @@
> #define atomic_cmpxchg __sync_val_compare_and_swap
>
> /* And even shorter names that return void. */
> +#ifdef __ATOMIC_RELAXED
> +/* C11 memory_order_relaxed */
> +#define atomic_inc(ptr) ((void) __atomic_fetch_add(ptr, 1,
> __ATOMIC_RELAXED))
> +#define atomic_dec(ptr) ((void) __atomic_fetch_add(ptr, -1,
> __ATOMIC_RELAXED))
> +#else
> +/* close to C11 memory_order_seq_cst */
> #define atomic_inc(ptr) ((void) __sync_fetch_and_add(ptr, 1))
> #define atomic_dec(ptr) ((void) __sync_fetch_and_add(ptr, -1))
> +#endif
> #define atomic_add(ptr, n) ((void) __sync_fetch_and_add(ptr, n))
> #define atomic_sub(ptr, n) ((void) __sync_fetch_and_sub(ptr, n))
> #define atomic_and(ptr, n) ((void) __sync_fetch_and_and(ptr, n))
>