gcl-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gcl-devel] cvs build on linux aborts


From: Camm Maguire
Subject: Re: [Gcl-devel] cvs build on linux aborts
Date: 31 Dec 2001 11:30:29 -0500

Greetings!

"Vadim V. Zhytnikov" <address@hidden> writes:

> Camm Maguire wrote:
> 
> > Greetings!  OK we certainly need to track this one down.  In sum, the
> > issue seems to be:
> >
> >         When compiled on libc2.2.4 Linux system, gcl will not run on
> >         any Linux system.
> >         When compiled on <= libc2.1.x Linux system, gcl will run on
> >         *any* Linux system.
> >
> 
> Not quite. My experience is as follows:
> 
> 1). I can compile GCL on systems with glibc 2.2.2 and 2.2.3.
> 2). Compilation on glibc 2.2.4 fails.
> 3). Binaries compiled on glibc 2.2.3 run on glibc 2.2.4 without
>      any trouble.
> 

OK, I have some preliminary good news.  I've upgraded my spare slow
box to libc2.2.4, and both cvs current and 2.5.0 compile without
problems.  I'm rechecking as I write with a clean cvs tree, and will
compile maxima with the produced saved_gcl to ensure, but so far, I
cannot reproduce this error with the libc2.2.4 shipping with Debian
(testing).

Now I'd be highly suspicious of the following, but I'll share the
thought anyway in case it does turn out to be the culprit -- the
kernel.  Are all the failing systems 2.4.x, and the working ones
2.2.x?  I do know that in 2.2.4, fault addresses are handled
differently, I believe with the siginfo mechanism ('man sigaction'),
and this is good news for us in general as it will be more portable
across different linux arches.  But I suppose it is possible that the
existing GET_FAULT_ADDRESS macro looking at cr2 in struct sigcontext
is failing with the new kernels.  The reason I thought this would be
unlikely is that it would seem reasonable that the old sigcontext
mechanism would be maintained on i386 for backward compatibility, and
that the error is *compile time* and not runtime, meaning that were
the kernel to be the problem, it would lie in the headers somewhere.
As we port gcl across Linux arches, we are going to have to deal with
this issue at some point, as we will likely begin to rely on the
siginfo, and therefore need to check at runtime that we are not
running against 2.2.x, or equivalent.

One thing that can be done is to comment out the GET_FAULT_ADDRESS,
and also the SGC macro, and run with normal garbage collecting only,
and see if the problem goes away.  This will not be a total fix, as
compiling maxima on such a gcl does segfault somewhere toward the end
on my box, but the error you guys are seeing is well before that
point.  I've put low priority on tracing the non-sgc garbage
collecting bug, as it seems that sgc is the way to go on most modern
systems. 

One other item to check are the binutils versions.  Reports here?

One other thing that I'm investigating -- try compiling setting the
DEBUG macro.  Many components seem to have support for this, and it has
been helpful in the powerpc builds I've been running.  

In general, gcl seems to have two main problem areas, the loading
mechanism (sfaslelf.c for us) and the garbage collector, with its
attending NULL_OR_ON_CSTACK, SGC, and GET_FAULT_ADDRESS macros.  I
think it will help us greatly to be able to isolate these subsystems
and verify them separately where possible.  To that end, there is a
STAND macro used in sfasl.c, which allows for compiling the loader as
a stand alone program, and comparing its results with the output of
ld.  This option does not yet work with sfaslelf.c, but I think its
worth some effort getting it to work, so I'll be trying to look into
this.  Anyone else wanting to jump in here is welcome!

I posted a query to comp.lang.lisp asking about the history and
advisability of 'fasloading', and received a helpful reply.  The
respondent agreed with the suggestion I made of moving *in the long
term* away from 'fasloading' and toward shared objects/dlopen.  He
also suggested an interesting idea of using an external garbage
collector, the Boehm conservative garbage collector, available on
Debian systems as libgc.  I think its at least worth experimenting
with, and if it is comparable in performance, may be worth adopting
for portability reasons alone.  He notes that garbage collectors are
expected of lisp systems.  I've read a few pros/cons regarding gc
vs. malloc/free (do a google search on garbage collector routines),
and must say that I'm not convinced that gc offers any real advantage,
and certainly introduces many headaches.  Other thoughts are welcome! 

The last paragraph is long term only, IMHO.  First, we need to
understand the existing system thoroughly, and port it as far as
practicable with as few changes as possible.  A lot of work has gone
into it, and we need to be absolutely convinced that any change of
this magnitude we make will indeed be an improvement.

Take care,

> >
> > Please correct the above if it is in error, or confirm that this is
> > the case if you haven't yet.
> >
> > 1) Robert -- do you have ssh access to this box?
> > 2) Both -- can you please compile with -g only (alter 386-linux.defs
> >    as necessary), run under the debugger, find out if this is a
> >    segfault, and if so where?
> 
> At present I'm desperately running out of my spare time but I'll try
> to do some tetsts a 2-3 days later. Sorry :(
> 
> >
> > 3) Vadim -- if memory serves, you tried to link statically, and this
> >    did not work.  Correct?
> 
> Correct.
> 
> >
> >
> > I do know of one explicit glibc issue that Dr. Schelter resolved for me
> > back in 5/2000.  He documented this in the changelog, and it enabled
> > dynamic linking against libc.  It was a change to rsym_elf to strip
> > the @GLIB off of symbols so that linking happened correctly.  I had
> > thought that maybe the new libc had a different labelling system, but
> > them static linking would work.  Please confirm that it does not work.
> >
> > 4) Robert, could you post the build output?
> >
> > In the meantime, I have one old box which I think I can spare to
> > upgrade to Debian testing, with the new libc.  It is slow, but perhaps
> > it will serve as another debugging platform for this.
> >
> > Take care,
> >
> > --
> > Camm Maguire                                            address@hidden
> > ==========================================================================
> > "The earth is but one country, and mankind its citizens."  --  Baha'u'llah
> 
> --
> 
> [ Vadim V. Zhytnikov  <address@hidden>  <address@hidden> ]
> 
> 
> 
> 
> 
> 

-- 
Camm Maguire                                            address@hidden
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah



reply via email to

[Prev in Thread] Current Thread [Next in Thread]