libunwind-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Libunwind-devel] libunwind with LD_PRELOAD option


From: Shan Shan
Subject: Re: [Libunwind-devel] libunwind with LD_PRELOAD option
Date: Mon, 5 Sep 2011 17:30:21 +0100

Hi Lassi,

Thanks. Please see inline.

On Mon, Sep 5, 2011 at 5:02 PM, Lassi Tuura <address@hidden> wrote:
Hi,

>   I am trying to implement a customer heap profiler for my ARM9 (arm926ejs) board.
>   Basically i have my own malloc, free and load that using LD_PRELOAD before i profile
>   the app. I wanted to add stack trace for each alloc/free. I send the LR return address
>   from each stack frame to a host where i will do the symbol mapping and make it in
>   human readable format.
>
>   The first part works fine but when i link with libunwind, the app crashes after sending
>   some 20K entries. I tried a simple application that does malloc and free with default gcc
>   options and it works fine. This app is compiled with O3 flags for some libraries and
>   Os flag for some. basically it is a big beasty app. Here are the details

You don't mention where the sending stacks to other machine comes in your
workflow and exact information about the crash causes.

Here is the malloc example




void *malloc(size_t size)
{
    void *p;

    if(!origMallocFp)
        getInstance(); => This is done with pthread_once.

    pthread_mutex_lock(&memTraceMutex);
    p = origMallocFp(size);
    if (p && socketInitialized)
    {
        void *bt[BACKTRACE_PTRS_SIZE];
        unsigned int numEntries;
        numEntries = mybacktrace ((unsigned int *)bt);
        unsigned int packet[64];
        unsigned int pktHdrLen;
        packet[0] = htonl(PKT_TYPE_MALLOC);
        packet[1] = htonl((unsigned int)p);
        packet[2] = htonl((unsigned int)(syscall(SYS_gettid)));
        packet[3] = htonl((unsigned int)size);
        packet[4] = htonl((unsigned int)numEntries);
        pktHdrLen = 5;
        formStackPacket (packet+pktHdrLen, (unsigned int *)bt, numEntries);
        rssSend (packet, sizeof (unsigned int) * (numEntries + pktHdrLen));
    }
    pthread_mutex_unlock(&memTraceMutex);

    return p;
}

In getInstance(), the malloc function pointer is initialized as below.

    origMallocFp = dlsym(RTLD_NEXT, "malloc");
    if ((error = dlerror()) != NULL)
    {
        origMallocFp = NULL;
        return;
    }


 
If you are writing
'malloc' profiler, you need to be very careful what you do while inside the
'malloc' call. If you are sending stacks elsewhere in the callback itself,
you have to be extremely careful not to trigger for example recursive calls
to malloc.
Yes, i experienced this when i first tried with glibc backtrace() and also printfs when i first started.
Hence i removed all that and this works fine. For days together i can profile the app and get the stats.
Without stack trace, this is only half the job done and teams take longer to find the exact place of leak :-(
 
You will also get calls from various bits of your libc, so you
need to be certain you do not do anything which will destabilise those. Do
beware all sorts of places call 'malloc', some you may find unexpected.

Yes. I have pasted above the code for malloc.

Also, when i don't link with -lunwind, the code is stable. I have tried with different versions of the app and it is consistent. 
So there is no recursive malloc hazard without unwind for sure.

Even without me calling any of the unwind apis (as mentioned in step 7), the app crashes. So pretty much this happens only when linking with -lunwind. It runs for a minute or so and it crashes. Within that minute, there were ~20K alloc/frees happened coz of constructors.

 
The unwind code you posted seems reasonable. I don't know ARM implementation
of libunwind, but I assume it doesn't make hazardous malloc calls; on x86-64
your implementation should be safe. I'd check the code which calls this first.

I've written one malloc profiler (called igprof), and we regularly profile
apps which allocate at 1-3 MHz rate, consist of hundreds of shared libraries
and run for hours, so I can offer advice on some practical aspects. I know
little about ARM tool chain however.
Great. Did you use LD_PRELOAD trick? It is so appealing because of it's ease of instrumentation. That's why i am not giving up yet to get the backtrace. The target is a small device with nand based filesystem and cannot hold huge data. Hence i send it to host for post processing.

Any advice/tips would be really great :-). Appreciate your help

Thanks

Regards,
Lassi



reply via email to

[Prev in Thread] Current Thread [Next in Thread]