qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] MTTCG status updates, benchmark results and KVM forum p


From: Alex Bennée
Subject: Re: [Qemu-devel] MTTCG status updates, benchmark results and KVM forum plans
Date: Tue, 16 Aug 2016 12:16:26 +0100
User-agent: mu4e 0.9.17; emacs 25.1.6

Emilio G. Cota <address@hidden> writes:

> On Mon, Aug 15, 2016 at 11:46:32 +0100, Alex Bennée wrote:
>> As far as I'm aware the following work is still ongoing:
>>
>> Emilo: cmpxchg atomics
>> Alvise: LL/SC modelling
>
> I've been tinkering with an experimental patch to do proper LL/SC. The idea
> is to rely on hardware transactional memory, so that stores don't have
> to be tracked. The trickiest thing is the fallback path, for which I'm
> trying to (ab)use EXCP_ATOMIC to execute exclusively from the ldrex
> all the way to the strex.
>
> To test it, I'm using aarch64-linux-user running qht-bench compiled on
> an aarch64 machine. I'm running on an Intel Skylake host (Skylake has
> no known TSX bugs)
>
> However, I'm finding issues that might not have to do with the
> patch itself.
>
> - On the latest MTTCG+cmpxchg tree (45c11751ed7 a.k.a.
>   bennee/mttcg/base-patches-v4-with-cmpxchg-atomics-v2), QEMU loops
>   forever without making progress in the instruction stream, even
>   with taskset -c 0.

Could this be a store-after-load barrier issue? I have a branch that
adds Pranith's work:

  
https://github.com/stsquad/qemu/tree/mttcg/base-patches-v4-with-cmpxchg-atomics-v2-and-barriers-v4

This seems to have eliminated some of the failure modes (usually kernel
complaining about stalled tasks) but I'm still seeing my test case fail
from time to time starting the benchmark task. Currently I'm not seeing
much information about why its failing to start though.

> - On the cmpxchg tree (rth's atomic-2 branch [1]), it works more
>   reliably, although tb_lock is held around tb_find_fast so parallelism isn't
>   very high. Still, it sometimes triggers the assert below.
>   - Applying the "remove tb_lock around hot path" patch makes it
>     easier to trigger this assert in cpu-exec.c:650 (approx.):
>             /* Assert that the compiler does not smash local variables. */
>             g_assert(cpu == current_cpu)
>     I've also seen triggered the assert immediately after that one, as well
>     as the rcu_read_unlock depth assert.

Odd - these are remnants of a dodgy compiler.

>   The asserts are usually triggered when all threads exit (by returning
>   NULL) at roughly the same time.
>   However, they cannot be triggered with taskset -c 0, which makes me
>   suspect that somehow start_exclusive isn't working as intended.
>
> Any tips would be appreciated! I'll reply with a patch that uses RTM,
> the one below is fallback path all the way, and the best to reproduce
> the above.

I'll see if I can reproduce the errors your seeing on my setup.

>
> Thanks,
>
>               Emilio
>
> [1] https://github.com/rth7680/qemu/commits/atomic-2
>
> From ed6af6eb364e5a36e81d7cc8143c0e9783c50587 Mon Sep 17 00:00:00 2001
> From: "Emilio G. Cota" <address@hidden>
> Date: Mon, 15 Aug 2016 00:27:42 +0200
> Subject: [PATCH] aarch64: use TSX for ldrex/strex (fallback path only)
>
> Signed-off-by: Emilio G. Cota <address@hidden>
> ---
>  linux-user/main.c          |  5 +++--
>  target-arm/helper-a64.c    | 23 +++++++++++++++++++++++
>  target-arm/helper-a64.h    |  4 ++++
>  target-arm/translate-a64.c | 15 +++++++++------
>  4 files changed, 39 insertions(+), 8 deletions(-)
>
> diff --git a/linux-user/main.c b/linux-user/main.c
> index 9880505..6922faa 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -192,8 +192,9 @@ static void step_atomic(CPUState *cpu)
>
>      /* Since we got here, we know that parallel_cpus must be true.  */
>      parallel_cpus = false;
> -    cpu_exec_step(cpu);
> -    parallel_cpus = true;
> +    while (!parallel_cpus) {
> +        cpu_exec_step(cpu);
> +    }
>
>      end_exclusive();
>  }
> diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
> index 8ce518b..a97b631 100644
> --- a/target-arm/helper-a64.c
> +++ b/target-arm/helper-a64.c
> @@ -579,3 +579,26 @@ uint64_t HELPER(paired_cmpxchg64_be)(CPUARMState *env, 
> uint64_t addr,
>
>      return !success;
>  }
> +
> +void HELPER(xbegin)(CPUARMState *env)
> +{
> +    uintptr_t ra = GETPC();
> +
> +    if (parallel_cpus) {
> +        cpu_loop_exit_atomic(ENV_GET_CPU(env), ra);
> +    }
> +}
> +
> +void HELPER(xend)(void)
> +{
> +    assert(!parallel_cpus);
> +    parallel_cpus = true;
> +}
> +
> +uint64_t HELPER(x_ok)(void)
> +{
> +    if (!parallel_cpus) {
> +        return 1;
> +    }
> +    return 0;
> +}
> diff --git a/target-arm/helper-a64.h b/target-arm/helper-a64.h
> index dd32000..e7ede43 100644
> --- a/target-arm/helper-a64.h
> +++ b/target-arm/helper-a64.h
> @@ -48,3 +48,7 @@ DEF_HELPER_FLAGS_3(crc32_64, TCG_CALL_NO_RWG_SE, i64, i64, 
> i64, i32)
>  DEF_HELPER_FLAGS_3(crc32c_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
>  DEF_HELPER_FLAGS_4(paired_cmpxchg64_le, TCG_CALL_NO_WG, i64, env, i64, i64, 
> i64)
>  DEF_HELPER_FLAGS_4(paired_cmpxchg64_be, TCG_CALL_NO_WG, i64, env, i64, i64, 
> i64)
> +
> +DEF_HELPER_1(xbegin, void, env)
> +DEF_HELPER_0(x_ok, i64)
> +DEF_HELPER_0(xend, void)
> diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
> index 450c359..cfcf440 100644
> --- a/target-arm/translate-a64.c
> +++ b/target-arm/translate-a64.c
> @@ -1760,6 +1760,8 @@ static void gen_load_exclusive(DisasContext *s, int rt, 
> int rt2,
>      TCGv_i64 tmp = tcg_temp_new_i64();
>      TCGMemOp be = s->be_data;
>
> +    gen_helper_xbegin(cpu_env);
> +
>      g_assert(size <= 3);
>      if (is_pair) {
>          TCGv_i64 hitmp = tcg_temp_new_i64();
> @@ -1825,6 +1827,9 @@ static void gen_store_exclusive(DisasContext *s, int 
> rd, int rt, int rt2,
>      tcg_gen_brcond_i64(TCG_COND_NE, addr, cpu_exclusive_addr, fail_label);
>
>      tmp = tcg_temp_new_i64();
> +    gen_helper_x_ok(tmp);
> +    tcg_gen_brcondi_i64(TCG_COND_EQ, tmp, 0, fail_label);
> +
>      if (is_pair) {
>          if (size == 2) {
>              TCGv_i64 val = tcg_temp_new_i64();
> @@ -1844,16 +1849,14 @@ static void gen_store_exclusive(DisasContext *s, int 
> rd, int rt, int rt2,
>          }
>      } else {
>          TCGv_i64 val = cpu_reg(s, rt);
> -        tcg_gen_atomic_cmpxchg_i64(tmp, addr, cpu_exclusive_val, val,
> -                                   get_mem_index(s),
> -                                   size | MO_ALIGN | s->be_data);
> -        tcg_gen_setcond_i64(TCG_COND_NE, tmp, tmp, cpu_exclusive_val);
> +        tcg_gen_qemu_st_i64(val, addr, get_mem_index(s), s->be_data + size);
>      }
>
>      tcg_temp_free_i64(addr);
> -
> -    tcg_gen_mov_i64(cpu_reg(s, rd), tmp);
>      tcg_temp_free_i64(tmp);
> +
> +    tcg_gen_movi_i64(cpu_reg(s, rd), 0);
> +    gen_helper_xend();
>      tcg_gen_br(done_label);
>
>      gen_set_label(fail_label);


--
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]