[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Chicken-hackers] [PATCH] Add a statistical profiler
From: |
Andy Bennett |
Subject: |
Re: [Chicken-hackers] [PATCH] Add a statistical profiler |
Date: |
Mon, 04 Jan 2016 17:03:36 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.8.0 |
Wow!
This is excellent and just what I need.
Thanks Peter!
> For a while now I've been meaning to improve our profiler. I finally
> got around to writing a new profiler a few weeks ago and I've been
> testing it on and off since then. I think now it's ready to start
> integrating it. I've successfully made some performance improvements in
> uri-generic and intarweb. The irregex performance improvement I sent to
> chicken-hackers was also a result from running it with the new profiler.
>
> I believe the profiler needs improving because of these limitations:
>
> * Instrumentation adds a whole lot of overhead and prevents various
> optimizations due to requiring dynamic-wind and turning all calls
> to such procedures into a CPS call. This skews the results,
> sometimes so extremely to make the profile output useless.
> If you want to see an example of this, try profiling the "nbody"
> benchmark from Mario's chicken-benchmarks repo. It goes from
> 1s to 17s on my laptop. Other examples include "knucleotide",
> "primes" and "ray2".
> * Only toplevel procedures are instrumented. This means that programs
> with large procedures or programs which rely heavily on closures will
> be hard to profile. It's possible to work around this by tweaking
> the source code, but this is very annoying to do.
> * Checking performance of a program requires recompilation with
> -profile. This is only a minor annoyance, but sometimes you
> want to look "deeper" inside the library calls that your program
> performs, and that means you'll have to recompile those libraries
> with -profile as well (and possibly libraries they use, and so on),
> which gets annoying pretty quickly. Also, you can easily forget you
> did that for a library, and this will slow down all programs using
> this library.
>
> The attached patches add a *second* profiler to CHICKEN, which uses
> statistical sampling to figure out where a program spends most of its
> time. It registers a timer with setitimer(). This will trigger a
> signal to be delivered every 10ms (this is customisable via "-:P").
>
> When a signal arrives, we look at the top of the trace buffer to
> find the currently running procedure. This means it needs *no*
> additional instrumentation or even recompilation: libraries or
> programs which are compiled without -no-trace or -d0 (i.e., most
> of them) are instantly profileable!
>
> This statistical sampling method is basically what gprof(1) and profil(3)
> also do, from what I understand of their documentation.
>
> To see it in action, just compile the target program as normal, and then
> when running it, pass it the "-:p" switch. This is recognised by the
> runtime system and therefore any program can be passed this switch.
> On exit, it will write out a "PROFILE.1234" file, where 1234 is the pid,
> just like under the instrumentation-based profiler. This can only be
> read with the new version of chicken-profile, which I've tweaked a tiny
> bit to understand the new profile format, which has a header now.
>
> The tweak to chicken-profile is necessary because the statistical
> profiler has very little context to work with: it only sees the top
> of the trace buffer, so it knows what procedure is currently running.
> Because it takes a sample every 10 ms, it can count how much time
> is spent in that procedure, but it doesn't really know how it got there,
> as it can't carefully track procedure entry/exit events like the
> instrumentation does under the old profiler.
>
> So, we know how much time is spent in a procedure, we can only calculate
> the percentage of the entire running time spent, individually per
> procedure. This means the percentages add up to 100, which is different
> from the original profiler. There, the percentage will be 100 for the
> "main" procedure, and less for everything called by it. Because of this,
> both profilers now write a "header" symbol to the start of the profile,
> which chicken-profile recognises and will change its behaviour on.
>
> The new profiler has none of the disadvantages of the old profiler, but
> comes with its own set of caveats:
>
> * If a program blocks signals, profiling is unreliable. I got a report
> from Kooda that SDL's vsync() seems to do this, so this is not always
> directly under the programmer's control.
> * The profile results are very different: it shows how much time is
> spent in the procedure that was called last, making it very low-level.
> This can sometimes result in a harder to read profile, with lots of
> details.
> The old profiler would simply state "procedure X is slow", while the
> new will say "call (A) on line 1 is slowest followed by call (B) on
> line 1507 and call (C) on line 5", even though lines 1 and 5 are both
> part of procedure X. Ideally you would want some way to sum the
> calls belonging to the same procedure. On the other hand, it can
> also be useful to know what each *part* of a procedure contributes
> to the profile.
> * Due to the statistical approach, the call counts can be *completely*
> wrong. If you have very fast procedures which are called extremely
> often, the number of calls will be way too low because it has missed
> several calls in between two samples.
>
> Finally, I should mention two more caveats, but I don't think these are
> huge problems:
>
> * Due to the statistical approach, the time spent can be over- or under-
> allocated. We simply allocate the 10ms to the currently running
> procedure at the time we take the sample. This should in general not
> be a problem because we take so many samples that it will average out:
> In the long run, small, fast procedures should not be running too often
> while a sample is taken.
> * The code is all in C, this is required to reduce the impact on
> performance as much as possible, and to avoid having to recompile.
> Also, code running with interrupts disabled would not receive the
> signal if we processed the signal in Scheme. Finally, the profiler
> might always see itself running when taking a sample if we did it in
> Scheme!
>
> Currently, profiling only works for compiled programs, but I think with
> some more or less invasive changes it may be possible to even profile
> programs running under the interpreter. This could not be made to work
> with the old profiler, as far as I can see.
>
> Given the above laundry list of caveats and the elegant approach of
> the old profiler (all in Scheme, very little code, precise and more
> high-level results), I think we should not be removing that one just
> yet. In fact, I think we could even merge some of these parts of the
> code, which should also remove some of the overhead of the old profiler,
> reducing some of its problems.
>
> I'm sure the above text sounds very rambling, so please feel free to
> ask if you have questions. I'm also preparing a blog post that
> describes the new profiler in a more structured and easy-going way.
>
> There's a lot more we could potentially do to improve either profiler,
> but let's get started by applying this first.
>
> Cheers,
> Peter
>
>
>
> _______________________________________________
> Chicken-hackers mailing list
> address@hidden
> https://lists.nongnu.org/mailman/listinfo/chicken-hackers
>
Regards,
@ndy
--
address@hidden
http://www.ashurst.eu.org/
0290 DA75 E982 7D99 A51F E46A 387A 7695 7EBA 75FF
signature.asc
Description: OpenPGP digital signature