[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Makefile and gprof problem
From: |
Paul Pluzhnikov |
Subject: |
Re: Makefile and gprof problem |
Date: |
Fri, 29 Dec 2006 10:33:42 -0800 |
User-agent: |
Gnus/5.1006 (Gnus v5.10.6) XEmacs/21.4 (Jumbo Shrimp, linux) |
RS <rsina_no.ssppaamm@comcast.net> writes:
>>> real 0m57.768s
>>> user 0m40.878s
>>> sys 0m7.812s
>> This says: most of the time is spent in user-level code.
>> Such code consists of:
>> 1. your own code (main exe, libraries, etc.), and
>> 2. "system" libraries (libc.dylib (or whatever it's called on OSX))
>
> OK Thanks, so most of the time is in the user level, which it makes
> sense because my code has many calls to 2D convolutions, linear algebra,
> and other user defined code that one would expect take some time to run.
Hmm. That would imply that gprof is not working for you ...
You should always start with a simple test case so you have some
confidence in your tools. Here is a "proof" that gprof works on
my system:
[BEGIN PROOF]
$ cat spin.c
#include <stdlib.h>
void spin(int x)
{
for (int i = 0; i < x; ++i)
;
}
void call_spin(int x)
{
for (int i = 0; i < x; ++i)
spin(x);
}
int main(int argc, char *argv[])
{
int count = 1000;
if (1 < argc) count = atoi(argv[1]);
call_spin(count);
return 0;
}
$ gcc --std=c99 spin.c -g -pg
$ time ./a.out 40000
real 0m14.347s
user 0m14.335s
sys 0m0.005s
$ gprof ./a.out
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
100.59 144.17 144.17 40000 0.00 0.00 spin
0.01 144.19 0.02 1 0.02 144.19 call_spin
Note that total seconds is very close to 10 * 'user time'.
This is the first time I see such discrepancy, and a quick google
search didn't reveal anything obvious.
It probably has to do with HZ=100 in Linux 2.4 but HZ=1000 in
2.6 kernels.
Also note that number of calls to spin() and call_spin() is correct.
granularity: each sample hit covers 2 byte(s) for 0.01% of 144.19 seconds
index % time self children called name
0.02 144.17 1/1 main [2]
[1] 100.0 0.02 144.17 1 call_spin [1]
144.17 0.00 40000/40000 spin [3]
-----------------------------------------------
<spontaneous>
[2] 100.0 0.00 144.19 main [2]
0.02 144.17 1/1 call_spin [1]
-----------------------------------------------
144.17 0.00 40000/40000 call_spin [1]
[3] 100.0 144.17 0.00 40000 spin [3]
-----------------------------------------------
All of the above also make perfect sense.
[END PROOF]
If the output on your system differes from the above (e.g. you get
no samples and 0.0 for 'self'), then clearly:
- something is wrong with your system (or with gprof on Mac OS X), and
- looking at results from your own program is pointless, since a
trivial test doesn't work.
> But I don't see how then
> one can in general use gprof, because most of the time one doesn't have
> access to system libraries that were built with -pg. I mean, one can
> not recompile libc.a every time one wants to use a profiler
You don't need to recompile libc.a every time.
You compile it once, call it libc_p.a, and arrange to link against
it when doing profiling. The exact details of how this is done are
system-specific, and I have zero experience with Mac OS; perhaps
someone else can guide you better.
> Well, I know for fact that my user defined code takes some time to run
> (extensive use of loops and user defined convoluted functions).
In that case my previous theory doesn't hold water, and you must
seek alternative explanation. One possible explanation is that gprof
is broken on your system (or on Mac OS X; or on specific revisions
of Mac OS and gcc ...).
Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.