[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [libunwind] libunwind segv with gcc 2.96 programs run on RedhatEL 3
Re: [libunwind] libunwind segv with gcc 2.96 programs run on RedhatEL 3 with GLIBC 2.3.2
Thu, 12 Feb 2004 10:29:23 +1100
Resend using the userid that is subscribed to this list.
On Wed, 11 Feb 2004 11:34:18 -0500,
"Harrow, Jerry" <address@hidden> wrote:
>If gdb is patched to use libunwind, does that mean you cannot debug
>applications built with Gcc 2.96.
gcc 2.96 generates incorrect assembler code for any functions that
contain a switch statement. It embeds the switch as data inside the
function which (according to DavidM) violates the ia64 ABI. The
embedded switch data in turn generates inconsistent unwind information,
the code length for the function recorded in the unwind information is
less than the real function code length.
Any calls to other functions that lie toward the end of a function with
a short unwind length will appear to return outside the calling
function, as far as unwind is concerned. That results in the unwinder
using incorrect information to trace back through the function with a
switch statement. Once the unwinder gets incorrect information then
you can forget about any backtrace.
When I build kernels using gcc 2.96, I have to run the kernel and the
kernel modules through a 860 line Perl script. That script detects any
mismatch between the unwind data and the actual length of each function
and attempts to correct the incorrect unwind data generated by gcc
2.96. Without such a correction, you can forget about backtracing gcc
2.96 code using the unwinder.
linux/arch/ia64/kernel/traps.c::ia64_fault was the worst example. It
is one big switch statement with a call at the end to die_if_kernel().
When compiled with gcc 2.96, ia64_fault has a length of 506 bundles but
unwind only records 472 bundles. When ia64_fault calls die_if_kernel,
the return address from die_if_kernel to ia64_fault is after the end of
the function, at least according to the unwind data. Which means that
the backtrace terminates at ia64_fault and you get no information at
all about what really caused the fault.
I expect user space code compiled with gcc 2.96 to have exactly the
same problem. Which makes the use of unwind data on gcc 2.96
applications rather pointless. This problem is in addition to the
incorrect termination of the _start code.
>Is there any way to detect that the application is not using the new
>unwind structures and therefore we should not attempt to utilize
>libunwind? Can you even tell what compiler was used in an image?
The .comment section tells you, objdump -s -j .comment image_name.
.comment contains a series of null delimited strings, one for each
object that was linked into the image. For the problem case, you get a
string like this
GCC: (GNU) 2.96 20000731 (Red Hat Linux 7.2 2.96-118.7.2)
Run the .comment section, any object compiled with any gcc 2.96 version
means that the entire program's unwind data is suspect. Don't forget
to check which gcc was used to build the run time libraries.