qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] aarch64: use TSX for ldrex/strex


From: Richard Henderson
Subject: Re: [Qemu-devel] [PATCH] aarch64: use TSX for ldrex/strex
Date: Thu, 18 Aug 2016 08:38:47 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0

On 08/17/2016 11:41 AM, Richard Henderson wrote:
On 08/17/2016 10:58 AM, Emilio G. Cota wrote:
(2) that we should start a new TB upon encountering a load-exclusive, so
that we maximize the chance of the store-exclusive being a part of the same
TB and thus have *nothing* extra between the beginning and commit of the
transaction.

I don't know how to do this. If it's easy to do, please let me know how
(for aarch64 at least, since that's the target I'm using).

It's a simple matter of peeking at the next instruction.

One way is to partially decode the insn before advancing the PC.

 static void disas_a64_insn (CPUARMState *env, DisasContext *s, int num_insns)
 {
    uint32_t insn = arm_ldl_code(env, s->pc, s->sctlr_b);
+
+   if (num_insns > 1 && (insn & xxx) == yyy) {
+       /* Start load-exclusive in a new TB.  */
+       s->is_jmp = DISAS_UPDATE;
+       return;
+   }
    s->insn = insn;
    s->pc += 4;
...


Alternately, store num_insns into DisasContext, and do pc -= 4 in 
disas_ldst_excl.

Actually, the mask check is the only really viable solution, and it needs to happen before we do the tcg_gen_insn_start thing.

A couple of other notes, as I've thought about this some more.

If the start and end of the transaction are not in the same TB, the likelihood of transaction failure should be very near 100%. Consider:

  * TB with ldrex ends before the strex.

  * Since the next TB hasn't been built yet, we'll definitely go
    through tb_find_physical, through the translator, and through
    the tcg compiler.

    (a) Which I think we can definitely assume will exhaust any
        resources associated with the transaction.
    (b) Which will abort the transaction,
    (c) Which, with the current code, will retry N times, with
        identical results, failing within the compiler each time,
    (d) Which, with the current code, will single-step through
        to the strex, as you saw.

  * Since we proceed to (d) the first time, we'll never succeed
    to create the next TB, so we'll always iterate compilation N
    times, resulting in the single-step.

This is probably the real slow-down that you see.

Therefore, we must abort any transaction when we exit tcg-generated code. Both through cpu_exit_loop or through the tcg epilogue. We should be able to use the software controlled bits associated with the abort to tell what kind of event lead to the abort. However, we must bear in mind that (for both x86 and ppc at least) we only have an 8-bit abort code. So we can't pass back a pointer, for instance.

We should think about what kinds of limitations we should accept for handling ll/sc via transactions.

  * How do we handle unpaired ldrexd / ldxp?  This is used by the compiler,
    as it's the only way to perform a double-word atomic load.

    This implies that we need some sort of counter, beyond which we stop
    trying to succeed via transaction.

  * In order to make normal cmpxchg patterns work, we have to be able to
    handle a branch within a ll/sc sequence.  Options:

    * Less complex way is to build a TB, including branches, with a max
      of N insns along the branch-not-taken path, searching for the strex.
      But of course this fails to handle legitimate patterns for arm
      (and other ll/sc guests).

      However, gcc code generation will generally annotate the cmpxchg
      failure branch as not-taken, so perhaps this will work well enough
      in practice.

    * More complex way is to build a TB, including branches, with a max
      of N insns along *all* paths, searching for the strex.  This runs
      into problems with, among other things, branches crossing pages.

    * Most complex way is to somehow get all of the TBs built, and
      linked together, preferably before we even try executing
      (and failing the transaction in) the first TB.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]