[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Libunwind-devel] Another optimisation for x86-64 fast trace

From: Lassi Tuura
Subject: [Libunwind-devel] Another optimisation for x86-64 fast trace
Date: Tue, 29 Mar 2011 19:39:08 +0200


Here's one more small performance patch for x86-64 fast trace: a slightly 
lighter getcontext. It reduces the number of clock cycles in getcontext by 
about factor of 6, which gives a small consistent (~3-5%, varies by run) 
improvement on the entire backtrace().

With this the cycles per stack walk and level (average/rms) has evolved like 
 - my original patches:
     per walk: 1502 / 2570; per level 55.0 / 222.4
     total run time: 1088 s (but slower profiler core from other results below)
 - after merge + patch to separate unw_backtrace() symbol:
     per walk: 2373 / 2886; per level 90.2 / 204.7
     total run time: 1130s
 - + other patches I sent before:
     per walk: 1500 / 2595; per level 57.1 / 200.9
     total run time: 969s
 - + this patch:
     per walk: 1445 / 2604; per level 55.0 / 208.7
     (actually some runs were better than this by a few % points,
      but the full application run time was more so I used this)
     total run time: 959s

In case you are interested, I put on web CPU cycle views of the source code and 
assembler, before (a) and after (b) this change. The before is also before my 
unw_tdep_trace -> tdep_trace changes posted today, and the after one has 
unrelated optimisations in e.g. "dumpOneProfile" which do not affect libunwind 
performance figures.


BTW, it occurs to me the tdep_trace off-by-one could be solved by giving one 
more stack level in getcontext_trace. Then tdep_trace wouldn't need the 'if' to 
protect filling 'address' array.


Attachment: 01-getcontext-lite.patch
Description: Binary data

reply via email to

[Prev in Thread] Current Thread [Next in Thread]