qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v10 0/7] trace: [tcg] Optimize per-vCPU tracing


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH v10 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches
Date: Tue, 11 Jul 2017 09:34:45 +0100
User-agent: Mutt/1.8.0 (2017-02-23)

On Thu, Jun 29, 2017 at 03:13:30PM +0300, Lluís Vilanova wrote:
> Optimizes tracing of events with the 'tcg' and 'vcpu' properties (e.g., memory
> accesses), making it feasible to statically enable them by default on all QEMU
> builds. Last patch shows that overheads are completely eliminated in various
> types of benchmarks for linux-user and softmmu (overheads where up to 2x
> before).
> 
> 
> Right now, events with the 'tcg' property always generate TCG code to trace 
> that
> event at guest code execution time, where the event's dynamic state is 
> checked.
> 
> This series adds a performance optimization where TCG code for events with the
> 'tcg' and 'vcpu' properties is not generated if the event is dynamically
> disabled. This optimization raises two issues:
> 
> * An event can be dynamically disabled/enabled after the corresponding TCG 
> code
>   has been generated (i.e., a new TB with the corresponding code should be
>   used).
> 
> * Each vCPU can have a different dynamic state for the same event (i.e., 
> tracing
>   the memory accesses of only one process pinned to a vCPU).
> 
> To handle both issues, this series integrates the dynamic tracing event state
> into the TB hashing function, so that vCPUs tracing different events will use
> separate TBs. Note that only events with the 'vcpu' property are used for
> hashing (as stored in the bitmap of #CPUState::trace_dstate).
> 
> This makes dynamic event state changes on vCPUs very efficient, since they can
> use TBs produced by other vCPUs while on the same event state combination (or
> produced by the same vCPU, earlier).
> 
> Discarded alternatives:
> 
> * Emitting TCG code to check if an event needs tracing, where we should still
>   move the tracing call code to either a cold path (making tracing performance
>   worse), or leave it inlined (making non-tracing performance worse).
> 
> * Eliding TCG code only when *zero* vCPUs are tracing an event, since enabling
>   it on a single vCPU will impact the performance of all other vCPUs that are
>   not tracing that event.
> 
> Signed-off-by: Lluís Vilanova <address@hidden>
> ---
> 
> Changes in v10
> ==============
> 
> * Replace lingering trace_get_vcpu_event_count() with
>   CPU_TRACE_DSTATE_MAX_EVENTS [Emilio G. Cota].
> * Add performance results for dbt-bench [Emilio G. Cota].
> 
> 
> Changes in v9
> =============
> 
> * Rebase on 931892e8a6.
> * Undo renaming of tb->trace_vcpu_dstate to the shorter tb->trace_ds.
> * Add measurements to commit enabling all guest events.
> 
> 
> Changes in v8
> =============
> 
> [Emilio G. Cota]
> 
> * Ported to current dev tree.
> 
> * Allocate cpu->trace_dstate statically. This
>   * allows us to drop the event_count inline patch.
>   * simplifies and improves the performance of accessing cpu->trace_dstate:
>     we just need to dereference, instead of going through bitmap_copy and
>     an intermediate unsigned long.
> 
> * If we try to register more CPU events than the max we support (there's a
>   constant for it), drop the event and tell the user with error_report. But
>   really this is a bug, since we control what CPU events are traceable. Should
>   we abort() as well?
> 
> * Added rth's R-b tag
> 
> * Addressed my own comments:
>   * rename tb->trace_vcpu_dstate to the shorter tb->trace_ds
>   * use uint32_t for tb->trace_ds instead of a typedef
>   * add BUILD_BUG_ON check to make sure tb->trace_ds is big enough
>   * fix xxhash
> 
> * Do not add trace_dstate to tb_htable_lookup, since we can grab it from
>   cpu->trace_dstate.
> 
> This patchset applies cleanly on top of rth's tcg-next (a01792e1e).
> 
> 
> Changes in v7
> =============
> 
> * Fix delayed dstate changes (now uses async_run_on_cpu() as suggested by 
> Paolo
>   Bonzini).
> 
> * Note to Richard: patch 4 has been adapted to the new patch 3 async callback,
>   but is essentially the same.
> 
> 
> Changes in v6
> =============
> 
> * Check hashing size error with QEMU_BUILD_BUG_ON [Richard Henderson].
> 
> 
> Changes in v5
> =============
> 
> * Move define into "qemu-common.h" to allow compilation of tests.
> 
> 
> Changes in v4
> =============
> 
> * Incorporate trace_dstate into the TB hashing function instead of using
>   multiple physical TB caches [suggested by Richard Henderson].
> 
> 
> Changes in v3
> =============
> 
> * Rebase on 0737f32daf.
> * Do not use reserved symbol prefixes ("__") [Stefan Hajnoczi].
> * Refactor trace_get_vcpu_event_count() to be inlinable.
> * Optimize cpu_tb_cache_set_requested() (hottest path).
> 
> 
> Changes in v2
> =============
> 
> * Fix bitmap copy in cpu_tb_cache_set_apply().
> * Split generated code re-alignment into a separate patch [Daniel P. 
> Berrange].
> 
> Lluís Vilanova (7):
>       exec: [tcg] Refactor flush of per-CPU virtual TB cache
>       trace: Allocate cpu->trace_dstate in place
>       trace: [tcg] Delay changes to dynamic state when translating
>       exec: [tcg] Use different TBs according to the vCPU's dynamic tracing 
> state
>       trace: [tcg] Do not generate TCG code to trace dynamically-disabled 
> events
>       trace: [tcg,trivial] Re-align generated code
>       trace: [trivial] Statically enable all guest events
> 
> 
>  accel/tcg/cpu-exec.c                     |    8 ++++++--
>  accel/tcg/cputlb.c                       |    2 +-
>  accel/tcg/translate-all.c                |   26 +++++++++++++++++++-------
>  include/exec/exec-all.h                  |   12 ++++++++++++
>  include/exec/tb-hash-xx.h                |    7 +++++--
>  include/exec/tb-hash.h                   |    5 +++--
>  include/qom/cpu.h                        |   12 ++++++------
>  qom/cpu.c                                |    8 --------
>  scripts/tracetool/__init__.py            |    3 ++-
>  scripts/tracetool/backend/dtrace.py      |    4 ++--
>  scripts/tracetool/backend/ftrace.py      |   20 ++++++++++----------
>  scripts/tracetool/backend/log.py         |   19 ++++++++++---------
>  scripts/tracetool/backend/simple.py      |    4 ++--
>  scripts/tracetool/backend/syslog.py      |    6 +++---
>  scripts/tracetool/backend/ust.py         |    4 ++--
>  scripts/tracetool/format/h.py            |   26 +++++++++++++++++++-------
>  scripts/tracetool/format/tcg_h.py        |   21 +++++++++++++++++----
>  scripts/tracetool/format/tcg_helper_c.py |    5 +++--
>  tcg/tcg-runtime.c                        |    3 ++-
>  tests/qht-bench.c                        |    2 +-
>  trace-events                             |    6 +++---
>  trace/control-target.c                   |   21 ++++++++++++++++++---
>  trace/control.c                          |    9 ++++++++-
>  trace/control.h                          |    3 +++
>  24 files changed, 157 insertions(+), 79 deletions(-)
> 
> 
> To: address@hidden
> Cc: Stefan Hajnoczi <address@hidden>
> Cc: Eduardo Habkost <address@hidden>
> Cc: Eric Blake <address@hidden>
> Cc: Emilio G. Cota <address@hidden>
> 

Besides Emilio's comments:

Reviewed-by: Stefan Hajnoczi <address@hidden>

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]