libunwind-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Libunwind-devel] 10% lost unwind traces on x86-64?


From: Lassi Tuura
Subject: Re: [Libunwind-devel] 10% lost unwind traces on x86-64?
Date: Tue, 9 Mar 2010 19:16:36 +0100

Hi,

Thanks Arun.

>> - Suspiciously large fraction of failures occur at (function+0), i.e. at 
>> function entry address.
> 
> This has been discussed before:
> 
> http://thread.gmane.org/gmane.comp.lib.unwind.devel/284/focus=296
> 
> There is a patch in that thread that might be useful for solving this.

Thanks, I'll try to digest that :-)

> For the remaining problems, I'd suggest:
> 
> * Trying a new libc
> 
> If the problem goes away, someone added missing unwind info.

I'll try on other systems, but I am afraid we're stuck with RHEL5 for now. If 
in the end this is the only fix, I'll put that forward, but I don't really 
expect they'd bite. At best I imagine we might get a handful of custom boxes 
for profiling work, but it would be a real pain from user support point of view.

> * Examining readelf -wf for the code in question
> 
> This is a manual step. If you can prove that the compiler modified the
> stack pointer and forgot to generate unwind info, try testing a more
> recent compiler.

Yes, this is exactly what I was doing. I had GDB attached to the program, and 
whenever my program detected anomalous stack trace I made full libunwind stack 
dump, had GDB dump the same stack trace, and then manually inspected each of 
the anomalies: the address, the disassembly and readelf unwind dumps.

The result was the five categories. I have not come across anything else yet in 
the hundreds of these I investigated. If you want the gory details I can post 
them.

> * Signal frames libunwind doesn't understand
> 
> I haven't seen weird calling conventions in practice yet. But signals
> are of two types:
> 
> * IP points to the instruction *after* the one that triggered the signal
> * IP points to the instruction *before* the one that triggered the signal
> 
> libunwind doesn't distinguish between the two yet.

Right, I think mine (SIGPROF) is of the first kind, i.e. saved %rip is where 
the execution will resume. What does libunwind assume for the "ip" of the dwarf 
cursor? Is it the instruction to be executed next, or already executed? Maybe I 
can knock up a patch for this.

I've not had problems getting through the signal frame.

Regards,
Lassi



reply via email to

[Prev in Thread] Current Thread [Next in Thread]