Re: [Qemu-devel] [PATCH] aarch64: use TSX for ldrex/strex

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] aarch64: use TSX for ldrex/strex

From:	Richard Henderson
Subject:	Re: [Qemu-devel] [PATCH] aarch64: use TSX for ldrex/strex
Date:	Thu, 18 Aug 2016 08:38:47 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0

On 08/17/2016 11:41 AM, Richard Henderson wrote:

On 08/17/2016 10:58 AM, Emilio G. Cota wrote:

(2) that we should start a new TB upon encountering a load-exclusive, so
that we maximize the chance of the store-exclusive being a part of the same
TB and thus have *nothing* extra between the beginning and commit of the
transaction.


I don't know how to do this. If it's easy to do, please let me know how
(for aarch64 at least, since that's the target I'm using).


It's a simple matter of peeking at the next instruction.

One way is to partially decode the insn before advancing the PC.

 static void disas_a64_insn (CPUARMState *env, DisasContext *s, int num_insns)
 {
    uint32_t insn = arm_ldl_code(env, s->pc, s->sctlr_b);
+
+   if (num_insns > 1 && (insn & xxx) == yyy) {
+       /* Start load-exclusive in a new TB.  */
+       s->is_jmp = DISAS_UPDATE;
+       return;
+   }
    s->insn = insn;
    s->pc += 4;
...


Alternately, store num_insns into DisasContext, and do pc -= 4 in 
disas_ldst_excl.

Actually, the mask check is the only really viable solution, and it needs tohappen before we do the tcg_gen_insn_start thing.


A couple of other notes, as I've thought about this some more.

If the start and end of the transaction are not in the same TB, the likelihoodof transaction failure should be very near 100%. Consider:


  * TB with ldrex ends before the strex.

  * Since the next TB hasn't been built yet, we'll definitely go
    through tb_find_physical, through the translator, and through
    the tcg compiler.

    (a) Which I think we can definitely assume will exhaust any
        resources associated with the transaction.
    (b) Which will abort the transaction,
    (c) Which, with the current code, will retry N times, with
        identical results, failing within the compiler each time,
    (d) Which, with the current code, will single-step through
        to the strex, as you saw.

  * Since we proceed to (d) the first time, we'll never succeed
    to create the next TB, so we'll always iterate compilation N
    times, resulting in the single-step.

This is probably the real slow-down that you see.

Therefore, we must abort any transaction when we exit tcg-generated code. Boththrough cpu_exit_loop or through the tcg epilogue. We should be able to usethe software controlled bits associated with the abort to tell what kind ofevent lead to the abort. However, we must bear in mind that (for both x86 andppc at least) we only have an 8-bit abort code. So we can't pass back apointer, for instance.

We should think about what kinds of limitations we should accept for handlingll/sc via transactions.


  * How do we handle unpaired ldrexd / ldxp?  This is used by the compiler,
    as it's the only way to perform a double-word atomic load.

    This implies that we need some sort of counter, beyond which we stop
    trying to succeed via transaction.

  * In order to make normal cmpxchg patterns work, we have to be able to
    handle a branch within a ll/sc sequence.  Options:

    * Less complex way is to build a TB, including branches, with a max
      of N insns along the branch-not-taken path, searching for the strex.
      But of course this fails to handle legitimate patterns for arm
      (and other ll/sc guests).

      However, gcc code generation will generally annotate the cmpxchg
      failure branch as not-taken, so perhaps this will work well enough
      in practice.

    * More complex way is to build a TB, including branches, with a max
      of N insns along *all* paths, searching for the strex.  This runs
      into problems with, among other things, branches crossing pages.

    * Most complex way is to somehow get all of the TBs built, and
      linked together, preferably before we even try executing
      (and failing the transaction in) the first TB.


r~

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] MTTCG status updates, benchmark results and KVM forum plans, Alex Bennée, 2016/08/15
- Re: [Qemu-devel] MTTCG status updates, benchmark results and KVM forum plans, Peter Maydell, 2016/08/15
  - Re: [Qemu-devel] MTTCG status updates, benchmark results and KVM forum plans, Alex Bennée, 2016/08/15
- Re: [Qemu-devel] MTTCG status updates, benchmark results and KVM forum plans, Emilio G. Cota, 2016/08/15
  - [Qemu-devel] [PATCH] aarch64: use TSX for ldrex/strex, Emilio G. Cota, 2016/08/15
    - Re: [Qemu-devel] [PATCH] aarch64: use TSX for ldrex/strex, Richard Henderson, 2016/08/17
    - Re: [Qemu-devel] [PATCH] aarch64: use TSX for ldrex/strex, Emilio G. Cota, 2016/08/17
    - Re: [Qemu-devel] [PATCH] aarch64: use TSX for ldrex/strex, Emilio G. Cota, 2016/08/17
    - Re: [Qemu-devel] [PATCH] aarch64: use TSX for ldrex/strex, Richard Henderson, 2016/08/17
    - Re: [Qemu-devel] [PATCH] aarch64: use TSX for ldrex/strex, Richard Henderson <=
    - Re: [Qemu-devel] [PATCH] aarch64: use TSX for ldrex/strex, Emilio G. Cota, 2016/08/24
    - [Qemu-devel] [PATCH 1/8] cpu list: convert to RCU QLIST, Emilio G. Cota, 2016/08/24
    - [Qemu-devel] [PATCH 3/8] rcu: add rcu_read_lock_held(), Emilio G. Cota, 2016/08/24
    - [Qemu-devel] [PATCH 7/8] htm: add powerpc64 intrinsics, Emilio G. Cota, 2016/08/24
    - [Qemu-devel] [PATCH 6/8] htm: add header to abstract Hardware Transactional Memory intrinsics, Emilio G. Cota, 2016/08/24
    - [Qemu-devel] [PATCH 8/8] target-arm/a64: use HTM with stop-the-world fall-back path, Emilio G. Cota, 2016/08/24
    - [Qemu-devel] [PATCH 2/8] cpu-exec: remove tb_lock from hot path, Emilio G. Cota, 2016/08/24
    - [Qemu-devel] [PATCH 4/8] target-arm: helper fixup for paired atomics, Emilio G. Cota, 2016/08/24
    - [Qemu-devel] [PATCH 5/8] linux-user: add stop-the-world to be called from CPU loop, Emilio G. Cota, 2016/08/24
  - Re: [Qemu-devel] MTTCG status updates, benchmark results and KVM forum plans, Alex Bennée, 2016/08/16

Prev by Date: Re: [Qemu-devel] [PATCH v5] fpu: add mechanism to check for invalid long double formats
Next by Date: Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd
Previous by thread: Re: [Qemu-devel] [PATCH] aarch64: use TSX for ldrex/strex
Next by thread: Re: [Qemu-devel] [PATCH] aarch64: use TSX for ldrex/strex
Index(es):
- Date
- Thread