libunwind-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Libunwind-devel] [PATCH 0/1] Fast back-trace for x86_64 for only collec


From: Lassi Tuura
Subject: [Libunwind-devel] [PATCH 0/1] Fast back-trace for x86_64 for only collecting the call stack
Date: Sat, 24 Apr 2010 11:37:12 +0200

Hi,

This patch adds new function to perform a pure stack walk without
unwinding, functionally similar to backtrace() but accelerated by an
address attribute cache the caller maintains across calls.

The feature is for now only implemented for x86_64 linux.  The patch
also improves DWARF-less RBP-based frame-chain traversal slightly.

The patch adds a new test to ensure the feature works properly,
i.e. returns the same addresses as backtrace().  The test has a fudge
factor, it seems backtrace() may return off-by-one addresses which the
client probably shouldn't use for symbol lookup, please see the
ongoing discussion on is_interrupted vs. is_signal vs. use_prev_instr.

Some statistics on the unwinding improvements follow.  I used igprof
on three single-threaded applications with memory allocation tracing
sampling on every malloc() call, and statistical performance profiling
sampling at 6.0ms interval.  Without instrumentation the applications
run 77-734 seconds, fully utilise one CPU, and perform 63M-606M memory
allocations: 800-900k per second on average.  Each application loads
250 MB worth of code from 594 shared libraries.

The test used 2 million entry address cache, and hit up to 282k unique
call sites.  The trace spent 100-130 TSC cycles per cached address on
RHEL5.4, GCC 4.5.0, 2x4-core Intel Xeon E5410 2.33GHz, 16 GB RAM.

In the results, walks is the number of stack walks; frames the number
of stack frames in total; steps the number of unw_step() calls; orig
and prof the user + system times in seconds for the original and
instrumented application runs.  System time is negligible in all cases
except normal memory tracing which had ~15% system to user time ratio.

Performance profiling results, normal vs. fast tracing.

 app       orig   prof          walks     frames    steps
 minbias   76.8   78.9 +2.7%   12'993    533'374  =frames
 ttbar    333.8  338.6 +1.4%   56'167  2'076'515  =frames
 qcd      733.6  740.1 +0.9%  123'045  4'294'584  =frames

 minbias   76.8   78.2 +1.8%   12'866    530'533   18'203
 ttbar    333.8  334.2 +0.1%   55'441  2'048'892   39'702
 qcd      733.6  732.2 -0.2%  121'723  4'250'743   50'711

Memory allocation tracing results, normal vs. fast tracing.

 app       orig   prof          walks     frames    steps
 minbias   76.8   1145 +1390%   63.3M      2305M  =frames
 ttbar    333.8   5137 +1439%  299.6M     10193M  =frames
 qcd      733.6  10507 +1332%  605.6M     20210M  =frames

 minbias   76.8  299.1 +289%    63.3M      2305M  277'744
 ttbar    333.8   1185 +255%   299.6M     10193M  281'636
 qcd      733.6   2400 +227%   605.6M     20210M  281'576

The address cache for fast trace probed hash must be large enough or
performance falls off the cliff, getting *much* slower than unw_step()
loop was.  Combined probe distribution for the three memory profiles:

 781397  >trace_lookup: updating slot after 0 steps
  50970  >trace_lookup: updating slot after 1 steps
   6911  >trace_lookup: updating slot after 2 steps
   1235  >trace_lookup: updating slot after 3 steps
    330  >trace_lookup: updating slot after 4 steps
     68  >trace_lookup: updating slot after 5 steps
     31  >trace_lookup: updating slot after 6 steps
      8  >trace_lookup: updating slot after 7 steps
      3  >trace_lookup: updating slot after 8 steps
      1  >trace_lookup: updating slot after 9 steps
      1  >trace_lookup: updating slot after 10 steps
      1  >trace_lookup: updating slot after 11 steps

Regards,
Lassi
---

Lassi Tuura (1):
     Fast back-trace for x86_64 for only collecting the call stack.


include/dwarf.h                   |    1 
include/libunwind-x86_64.h        |   33 +++
include/tdep-arm/libunwind_i.h    |    1 
include/tdep-hppa/libunwind_i.h   |    1 
include/tdep-ia64/libunwind_i.h   |    1 
include/tdep-mips/libunwind_i.h   |    1 
include/tdep-ppc32/libunwind_i.h  |    3 
include/tdep-ppc64/libunwind_i.h  |    3 
include/tdep-x86/libunwind_i.h    |    1 
include/tdep-x86_64/libunwind_i.h |    6 -
src/Makefile.am                   |    8 -
src/arm/init.h                    |    1 
src/dwarf/Gparser.c               |    4 
src/hppa/init.h                   |    1 
src/mips/init.h                   |    1 
src/ppc32/init.h                  |    1 
src/ppc64/init.h                  |    1 
src/x86/init.h                    |    1 
src/x86_64/Ginit_local.c          |    4 
src/x86_64/Gos-linux.c            |   33 +--
src/x86_64/Gstash_frame.c         |   92 ++++++++
src/x86_64/Gstep.c                |   44 +++-
src/x86_64/Gtrace.c               |  401 +++++++++++++++++++++++++++++++++++++
src/x86_64/Lstash_frame.c         |    5 
src/x86_64/Ltrace.c               |    5 
src/x86_64/init.h                 |    1 
tests/Gtest-trace.c               |  265 ++++++++++++++++++++++++
tests/Ltest-trace.c               |    5 
tests/Makefile.am                 |    3 
tests/check-namespace.sh.in       |    6 +
30 files changed, 892 insertions(+), 41 deletions(-)
create mode 100644 src/x86_64/Gstash_frame.c
create mode 100644 src/x86_64/Gtrace.c
create mode 100644 src/x86_64/Lstash_frame.c
create mode 100644 src/x86_64/Ltrace.c
create mode 100644 tests/Gtest-trace.c
create mode 100644 tests/Ltest-trace.c





reply via email to

[Prev in Thread] Current Thread [Next in Thread]