[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Libunwind-devel] libunwind with LD_PRELOAD option
From: |
Lassi Tuura |
Subject: |
Re: [Libunwind-devel] libunwind with LD_PRELOAD option |
Date: |
Mon, 5 Sep 2011 20:40:59 +0200 |
Hi,
> Here is the malloc example
> void *malloc(size_t size)
> {
> void *p;
>
> if(!origMallocFp)
> getInstance(); => This is done with pthread_once.
If I understand your quote correctly, this may end up calling dlsym(), which
may internally call malloc(). You are not really pasting enough code here to
tell for sure that you code is problem free; it's hard to reason about the code
based on the information at hand. You might want to review your code with a
very critical eye on all calls.
> formStackPacket (packet+pktHdrLen, (unsigned int *)bt, numEntries);
> rssSend (packet, sizeof (unsigned int) * (numEntries + pktHdrLen));
Black boxes, hard to say what they do. Could they allocate memory or otherwise
end in trouble?
It could be something as simple as these bits have an error path which gets
fired when you send more data with the full stack trace, and the error path
does some memory allocation. Without stack tracing you might never hit the
error path.
> Yes, i experienced this when i first tried with glibc backtrace() and also
> printfs when i first started.
> Hence i removed all that and this works fine. For days together i can profile
> the app and get the stats.
> Without stack trace, this is only half the job done and teams take longer to
> find the exact place of leak :-(
Unfortunately that doesn't say much. You could just be lucky and not call
anything which triggers problems. For example if you add stacks to your network
stuff, maybe it exceeds some threshold and does some allocation, or hits an
error path you don't otherwise trigger, or ...?
> Also, when i don't link with -lunwind, the code is stable. I have tried with
> different versions of the app and it is consistent.
> So there is no recursive malloc hazard without unwind for sure.
It's a data point, but could just be circumstantial. It's hard to say for sure
from data.
> Great. Did you use LD_PRELOAD trick? It is so appealing because of it's ease
> of instrumentation.
No, we inject a hook into functions by rewriting the function prologue on the
fly.
> That's why i am not giving up yet to get the backtrace. The target is a
> small device with nand based filesystem and cannot hold huge data. Hence i
> send it to host for post processing.
Let me throw a few ideas here, though extensive follow-up would probably better
be off the libunwind list.
On x86-64 we use libunwind to capture stack trace (ia32 uses something else) on
every allocation. Each allocation is associated to its full stack trace, and we
can dump this "heap snapshot" at any time during running, or at the end as a
final profile result. We use these for leak checking, identifying peak use,
general allocation profiling, correlating performance and allocation behaviour,
looking for churn, delta comparisons between runs/versions, fragmentation and
locality studies, etc. The heap snapshots are many orders of magnitude smaller
than the entire stream of stack traces on allocation would be.
The applications we profile generate prodigious number of allocation samples,
on average 40 levels deep stacks from 700 or so shared libraries, 1-3 million
times a second. It's not unusual we track ~7-10 million concurrently live
allocations. The apps run anywhere from ~15 minutes to 24 hours.
Long long time ago we use to generate a serialised stream of stack traces, like
you appear to do, then absorb it in a collector to summary. We moved away from
doing that because there was no way to deal with the data stream at the rate it
was produced, even if the consumer was multi-threaded and used numerous tricks
to speed up consuming the stack trace data. But maybe your data rate isn't as
high as ours...
We've settled on a data structure which is moderate enough in extra size (=
needs <100% extra virtual memory) and is fast enough to update (~140% run time
increase at 1MHz, vs. x10-20 for valgrind), and handles multi-threaded apps
too. If allocation rate is less fanatic, the overhead is less, much less. The
heap snapshots are very manageable size, about 30MB compressed per 1-2 GB of
VSIZE.
I don't know what sort of constraints you have on your target device, or what
your target app's behaviour is, but my experience was that summarising the
allocation data in-process virtual memory was by far the winner. YMMV, much
depends on how much extra RAM you can expend, and what sort of allocation rate
you experience, and other factors.
Regards,
Lassi
- [Libunwind-devel] libunwind with LD_PRELOAD option, Shan Shan, 2011/09/05
- Re: [Libunwind-devel] libunwind with LD_PRELOAD option, Lassi Tuura, 2011/09/05
- Re: [Libunwind-devel] libunwind with LD_PRELOAD option, Shan Shan, 2011/09/05
- Re: [Libunwind-devel] libunwind with LD_PRELOAD option, Arun Sharma, 2011/09/05
- Re: [Libunwind-devel] libunwind with LD_PRELOAD option, Shan Shan, 2011/09/06
- Re: [Libunwind-devel] libunwind with LD_PRELOAD option, Shan Shan, 2011/09/06
- Re: [Libunwind-devel] libunwind with LD_PRELOAD option, Arun Sharma, 2011/09/06
- Re: [Libunwind-devel] libunwind with LD_PRELOAD option, Shan Shan, 2011/09/06
- Re: [Libunwind-devel] libunwind with LD_PRELOAD option, Arun Sharma, 2011/09/06
- Re: [Libunwind-devel] libunwind with LD_PRELOAD option, Shan Shan, 2011/09/06
- Re: [Libunwind-devel] libunwind with LD_PRELOAD option, Arun Sharma, 2011/09/06
- Re: [Libunwind-devel] libunwind with LD_PRELOAD option, Ken Werner, 2011/09/07