[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Makefile and gprof problem

From: Paul Pluzhnikov
Subject: Re: Makefile and gprof problem
Date: Fri, 29 Dec 2006 10:33:42 -0800
User-agent: Gnus/5.1006 (Gnus v5.10.6) XEmacs/21.4 (Jumbo Shrimp, linux)

RS <> writes:

>>> real    0m57.768s
>>> user    0m40.878s
>>> sys     0m7.812s

>> This says: most of the time is spent in user-level code.
>> Such code consists of:
>> 1. your own code (main exe, libraries, etc.), and
>> 2. "system" libraries (libc.dylib (or whatever it's called on OSX))
> OK Thanks, so most of the time is in the user level, which it makes
> sense because my code has many calls to 2D convolutions, linear algebra,
> and other user defined code that one would expect take some time to run.

Hmm. That would imply that gprof is not working for you ...

You should always start with a simple test case so you have some
confidence in your tools. Here is a "proof" that gprof works on
my system:


    $ cat spin.c
    #include <stdlib.h>

    void spin(int x)
      for (int i = 0; i < x; ++i)

    void call_spin(int x)
      for (int i = 0; i < x; ++i)

    int main(int argc, char *argv[])
      int count = 1000;
      if (1 < argc) count = atoi(argv[1]);
      return 0;

    $ gcc --std=c99 spin.c -g -pg
    $ time ./a.out 40000

    real  0m14.347s
    user  0m14.335s
    sys   0m0.005s

    $ gprof ./a.out
    Flat profile:

    Each sample counts as 0.01 seconds.
      %   cumulative   self              self     total           
     time   seconds   seconds    calls   s/call   s/call  name    
    100.59    144.17   144.17    40000     0.00     0.00  spin
      0.01    144.19     0.02        1     0.02   144.19  call_spin

Note that total seconds is very close to 10 * 'user time'.

This is the first time I see such discrepancy, and a quick google
search didn't reveal anything obvious.

It probably has to do with HZ=100 in Linux 2.4 but HZ=1000 in
2.6 kernels.

Also note that number of calls to spin() and call_spin() is correct.

    granularity: each sample hit covers 2 byte(s) for 0.01% of 144.19 seconds

    index % time    self  children    called     name
                    0.02  144.17       1/1           main [2]
    [1]    100.0    0.02  144.17       1         call_spin [1]
                  144.17    0.00   40000/40000       spin [3]
    [2]    100.0    0.00  144.19                 main [2]
                    0.02  144.17       1/1           call_spin [1]
                  144.17    0.00   40000/40000       call_spin [1]
    [3]    100.0  144.17    0.00   40000         spin [3]

All of the above also make perfect sense.

If the output on your system differes from the above (e.g. you get
no samples and 0.0 for 'self'), then clearly:

- something is wrong with your system (or with gprof on Mac OS X), and
- looking at results from your own program is pointless, since a
  trivial test doesn't work.

> But I don't see how then
> one can in general use gprof, because most of the time one doesn't have
> access to system libraries that were built  with -pg. I mean, one can
> not recompile libc.a every time one wants to use a profiler

You don't need to recompile libc.a every time.

You compile it once, call it libc_p.a, and arrange to link against
it when doing profiling. The exact details of how this is done are
system-specific, and I have zero experience with Mac OS; perhaps
someone else can guide you better.

> Well, I  know for fact that my user defined code takes some time to run
> (extensive use of loops and user defined convoluted functions).

In that case my previous theory doesn't hold water, and you must
seek alternative explanation. One possible explanation is that gprof
is broken on your system (or on Mac OS X; or on specific revisions
of Mac OS and gcc ...).

In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]