qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory bar


From: Sergey Fedorov
Subject: Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
Date: Tue, 7 Jun 2016 00:49:47 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0

On 07/06/16 00:00, Peter Maydell wrote:
> On 6 June 2016 at 21:30, Sergey Fedorov <address@hidden> wrote:
>> On 06/06/16 22:28, Pranith Kumar wrote:
>>> On Mon, Jun 6, 2016 at 3:23 PM, Richard Henderson <address@hidden> wrote:
>>>> On 06/06/2016 10:11 AM, Pranith Kumar wrote:
>>>>> If I read it correctly TCG_BAR_SYNC is equivalent to OR of all the
>>>>> other four barriers. I am not sure if we can just construct SYNC like
>>>>> this or if we need to define it explicitly though.
>>>> AFAICS, sparc membar #sync is stronger.
>>> I tried looking it up but it's not clear. How is it stronger? And do
>>> we need those strong guarantees in our front-end/back-end?
>> That is not clear for me either :( AFAIU, PPC's lwsync does allow stores
>> to be reordered after loads but hwsync - not.
> Yes, from the PoV of the other CPU. That is, for write-then-read by
> CPU 0, CPU 0 will always read what it wrote, but other CPUs don't
> necessarily see the write before the read is satisfied.
> https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
> describes the difference in sections 3.2 and 3.3, and has an
> example in section 6 of a situation which requires a full
> (hw)sync and an lwsync is not sufficient.
>
>> I suspect Sparc's membar
>> #Sync is used to ensure that some system operations are complete before
>> proceeding with execution. I'm not sure we need to introduce this into
>> TCG. It needs to be clear what is it and how to use it.
> My reading of the manual is that the SPARC "membar #Sync" is like ARM
> ISB -- it enforces that any instruction (whether a memory access or not)
> before it must finish before anything after it can start. It only
> affects the CPU that issues it (assuming you didn't also specify
> any of the bits requesting memory barriers!) Since TCG doesn't attempt
> to reorder instructions, we likely don't need to do anything except
> maybe end the current TB. Also if we're still trying to do TLB
> operations on other CPUs asynchronously we need to wait for them to
> finish; I forget what the conclusion was on that idea.
> PPC equivalent insn is isync I think.
>

Thanks for commenting this, Peter. AFAIU, a sequential consistency
barrier is stronger than a release-aquire barrier because it provides
"transitivity/commutativity" [1]. This is what general barrier
guarantees in Linux [2]. I especially like this piece of description
from [2]:

    ... if this example runs on a system where CPUs 1 and 2 share a
    store buffer or a level of cache, CPU 2 might have early access to
    CPU 1's writes. General barriers are therefore required to ensure
    that all CPUs agree on the combined order of CPU 1's and CPU 2's
    accesses.

Current Linux kernel implements Sparc's smp_mb()/__smp_mb()/mb() with
"membar #StoreLoad" [3]. So we'll probably be fine with just RR, RW, WR,
and WW bits in the TCG memory barrier operation attribute.

Kind regards,
Sergey

[1] https://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync#Overall_Summary
[2]
http://lxr.free-electrons.com/source/Documentation/memory-barriers.txt#L1268
[3]
http://lxr.free-electrons.com/source/arch/sparc/include/asm/barrier_64.h#L36


reply via email to

[Prev in Thread] Current Thread [Next in Thread]