qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory bar


From: Richard Henderson
Subject: Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
Date: Thu, 2 Jun 2016 18:08:57 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.1.0

On 06/02/2016 02:37 PM, Sergey Fedorov wrote:
On 03/06/16 00:18, Richard Henderson wrote:
On 06/02/2016 01:38 PM, Sergey Fedorov wrote:
On 02/06/16 23:36, Richard Henderson wrote:
On 06/02/2016 09:30 AM, Sergey Fedorov wrote:
I think we need to extend TCG load/store instruction attributes to
provide information about guest ordering requirements and leave
this TCG
operation only for explicit barrier instruction translation.

I do not agree.  I think separate barriers are much cleaner and easier
to manage and reason with.


How are we going to emulate strongly-ordered guests on weakly-ordered
hosts then? I think if every load/store operation must specify which
ordering it implies then this task would be quite simple.

Hum.  That does seem helpful-ish.  But I'm not certain how helpful it
is to complicate the helper functions even further.

What if we have tcg_canonicalize_memop (or some such) split off the
barriers into separate opcodes.  E.g.

MO_BAR_LD_B = 32    // prevent earlier loads from crossing current op
MO_BAR_ST_B = 64    // prevent earlier stores from crossing current op
MO_BAR_LD_A = 128    // prevent later loads from crossing current op
MO_BAR_ST_A = 256    // prevent later stores from crossing current op
MO_BAR_LDST_B = MO_BAR_LD_B | MO_BAR_ST_B
MO_BAR_LDST_A = MO_BAR_LD_A | MO_BAR_ST_A
MO_BAR_MASK = MO_BAR_LDST_B | MO_BAR_LDST_A

// Match Sparc MEMBAR as the most flexible host.
TCG_BAR_LD_LD = 1    // #LoadLoad barrier
TCG_BAR_ST_LD = 2    // #StoreLoad barrier
TCG_BAR_LD_ST = 4    // #LoadStore barrier
TCG_BAR_ST_ST = 8    // #StoreStore barrier
TCG_BAR_SYNC  = 64    // SEQ_CST barrier

where

  tcg_gen_qemu_ld_i32(x, y, i, m | MO_BAR_LD_BEFORE | MO_BAR_ST_AFTER)

emits

  mb        TCG_BAR_LD_LD
  qemu_ld_i32    x, y, i, m
  mb        TCG_BAR_LD_ST

We can then add an optimization pass which folds barriers with no
memory operations in between, so that duplicates are eliminated.

It would give us three TCG operations for each memory operation instead
of one. But then we might like to combine these barrier operations back
with memory operations in each backend. If we propagate memory ordering
semantics up to the backend, it can decide itself what instructions are
best to generate.

A strongly ordered target would generally only set BEFORE bits or AFTER bits, but not both (and I suggest we canonicalize on AFTER for all such targets). Thus a strongly ordered target would produce only 2 opcodes per memory op.

I supplied both to make it easier to handle a weakly ordered target with acquire/release bits.

I would *not* combine the barrier operations back with memory operations in the backend. Only armv8 and ia64 can do that, and given the optimization level at which we generate code, I doubt it would really make much difference above separate barriers.

So I would just focus on translating only explicit memory barrier
operations for now.

Then why did you bring it up?


r~




reply via email to

[Prev in Thread] Current Thread [Next in Thread]