qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v1 3/5] include/qemu/atomic.h: default to __atom


From: Pranith Kumar
Subject: Re: [Qemu-devel] [PATCH v1 3/5] include/qemu/atomic.h: default to __atomic functions
Date: Fri, 01 Apr 2016 16:35:37 -0400
User-agent: mu4e 0.9.9.5; emacs 25.1.50.2

Hi Alex,

I have one question inline below.

Alex Bennée writes:

> The __atomic primitives have been available since GCC 4.7 and provide
> a richer interface for describing memory ordering requirements. As a
> bonus by using the primitives instead of hand-rolled functions we can
> use tools such as the AddressSanitizer which need the use of well
> defined APIs for its analysis.
>
> If we have __ATOMIC defines we exclusively use the __atomic primitives
> for all our atomic access. Otherwise we fall back to the mixture of
> __sync and hand-rolled barrier cases.
>
> +/* For C11 atomic ops */
> +
> +/* Manual memory barriers
> + *
> + *__atomic_thread_fence does not include a compiler barrier; instead,
> + * the barrier is part of __atomic_load/__atomic_store's "volatile-like"
> + * semantics. If smp_wmb() is a no-op, absence of the barrier means that
> + * the compiler is free to reorder stores on each side of the barrier.
> + * Add one here, and similarly in smp_rmb() and smp_read_barrier_depends().
> + */
> +
> +#define smp_mb()    ({ barrier(); __atomic_thread_fence(__ATOMIC_SEQ_CST); 
> barrier(); })

I could not really understand why we need to wrap the fence with
barrier()'s. There are three parts to my confusion. Let me ask one after the
other.

First, these primitives are used in qemu codebase which runs on the host
architecture. Let us consider two example architectures: x86 and ARM.

On x86, __atomic_thread_fence(__ATOMIC_SEQ_CST) will generate an mfence
instruction. On ARM, this will generate the dmb instruction. Both these
serializing instructions also act as compiler barriers. Is there any
architecture which does not generate such a serializing instruction?

> +#define smp_wmb()   ({ barrier(); __atomic_thread_fence(__ATOMIC_RELEASE); 
> barrier(); })
> +#define smp_rmb()   ({ barrier(); __atomic_thread_fence(__ATOMIC_ACQUIRE); 
> barrier(); })

Second, why do you need barrier() on both sides? One barrier() seems to be
sufficient to prevent the compiler from reordering across the macro. Am I
missing something?

Finally, I tried looking at the gcc docs but could find nothing regarding
__atomic_thread_fence() not being considered as a memory barrier. What I did
find mentions about it being treated as a function call during the main
optimization stages and not during later stages:

http://www.spinics.net/lists/gcchelp/msg39798.html

AFAIU, in these later stages, even adding a barrier() as we are doing will
have no effect.

Can you point me to any docs which talk more about this?

Thanks!
-- 
Pranith



reply via email to

[Prev in Thread] Current Thread [Next in Thread]