[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gcl-devel] Re: gzipped tar file on profiling

From: Matt Kaufmann
Subject: Re: [Gcl-devel] Re: gzipped tar file on profiling
Date: Tue, 30 Dec 2003 17:20:16 -0600

Thank you, Camm.  We already do some fixnum declaration and proclaiming in ACL2
for GCL, but I can imagine that we could benefit from doing some more.  I'll
try to do that at some point.  We can't just proclaim all integers to be
fixnums, however, in case that is what sys-proclaim.lsp is intended to do.  We
do our own proclaiming automatically; roughly speaking, we use declare forms of
a defun for input types and THE forms for output types (but with macroexpansion
as necessary).  Our top-level function that does this is function proclaim-file
in source file acl2-fns.lisp, in case you're interested.

Is there some way that make_fixnum could avoid creating new fixnum objects more
than once for each fixnum up to some limit, and also avoids garbage collecting
such objects?  Maybe there is already such a mechanism in GCL and the limit
merely needs to be increased, in order to fix most of this performance issue
(since you say that GC time is the problem, presumably rather than
fixnum-building time).  What do you think?

Thanks --
-- Matt
   Cc: address@hidden, address@hidden, address@hidden,
   From: Camm Maguire <address@hidden>
   Date: 30 Dec 2003 17:11:42 -0500
   User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2
   Content-Type: text/plain; charset=us-ascii

   Greetings!  Just a preliminary note here -- the overwhelming majority
   of the gbc cost with or without sgc is in collecting fixnum garbage.
   Furthermore, the vast majority of this is generated in your compiled
   lisp code.  

   As you probably know, if GCL can figure out that an integer is a
   fixnum, usually via proclaims or declares, it will compile it directly
   to a machine long integer avoiding any memory allocation.  Otherwise,
   it calls the allocating routine make_fixnum.  

   At present, the only known (to me) system macro which often
   unnecessarily allocates fixnums is dotimes -- if your code uses this a
   lot let me know and I'll send you a replacement to try.  

   Otherwise these massive calls to make_fixnum are likely in code
   written by the compiler.  You likely know which functions of yours
   take the most time.  You can look for calls to make_fixnum via the
   (disassemble...)  function which will output the generated C code.  A
   simple (declare (fixnum i)) or similar where appropriate will likely
   eliminate most of the runtime.  

   To be a bit more specific, to avoid any allocation, the compiler needs
   to know both that the bounds of the integer are appropriate (which can
   be gleaned from the (declare...) statement), and that any functions
   using the variable as an argument have been compiled to take machine
   integer arguments (which information typically comes from
   (proclaims...)).  The compiler will "promote" a machine integer to a
   lisp object via allocation if the integer is passed to a function
   taking lisp objects as arguments.  You also probably know about the
   automatic way of making a sys-proclaim.lsp for use by the compiler
   in this regard.  Briefly, just load gcl_collectfn.o and execute
   (compiler::emit-fn t) before compiling (generating .fn files for each
   .lisp file), and then (compiler::make-all-proclaims "*.fn").  Then
   load sys-proclaim.lsp before recompiling, or put this load command in
   a gcl_cmpinit.lsp file in the source directory.  All this needs to be
   automated in my opinion.

   The only current drawback of gprof profiling is that compiled lisp
   code as conventionally loaded into the program's .data section is all
   lumped together in one entry labeled <hicore>.  I'll be working on
   improving this at some point.  For now, you can build your final
   executable in an alternate way which will retain the identities of the
   compiled lisp functions for the purposes of profiling should you be
   unable to identify the culprit yourself.  Basically you set the
   variable compiler::*default-system-p* to t before compiling, and
   instead of using (si::save-system...), do

           (list "module1.o" "module2.o" ...)
           "(progn (lisp-code-to-run-after-loading-and-before-saving))")

   Here is how I do this for acl2 (this method is alas required on some

   BINARIES:=acl2-fns.o axioms.o basis.o translate.o type-set-a.o type-set-b.o 
rewrite.o simplify.o \
             bdd.o other-processes.o induct.o history-management.o prove.o 
defuns.o proof-checker-a.o\
             defthm.o other-events.o ld.o proof-checker-b.o tutorial.o 
interface-raw.o \
             linear-a.o linear-b.o non-linear.o TMP1.o
   BIN:=$(patsubst %,"%",$(BINARIES))

   INIT:=(initialize-acl2 (quote include-book) *acl2-pass-2-files* t t)
   POST:=(load \"foo.lsp\")\
         (setq compiler::*default-system-p* nil)\
         (in-package \"acl2\")\
         (save-acl2 (quote (initialize-acl2 (quote include-book) 
*acl2-pass-2-files* t)) \"saved_acl2\"))

   nsaved_acl2: foo.lsp
           echo '(load "foo.lsp")(in-package "acl2")(compile-acl2)' | gcl
           echo '(load "foo.lsp")(in-package "acl2")(load-acl2)$(INIT)' | gcl
           echo '(compiler::link (list $(BIN)) "$@" "$(POST)" "" nil)' |gcl

   I'd be interested in knowing any results, as my current guess is that
   Allegro can infer a fixnum where we cannot at present.  If you can
   give me the details of such misinference, I can try to fix it.

   I would have guessed that you were doing a lot of array indexing, but
   this appears not to be the case.

   Take care, 

   Matt Kaufmann <address@hidden> writes:

   > Camm --
   > The run has completed.  It seemed to take much longer without SGC.  I'll 
   > you the uuencoded gzipped tar file (509K) now (see the README after you 
   > it).  I'll spare those CCed on this email from that file.
   > -- Matt
   >    cc: address@hidden, address@hidden, address@hidden,
   >       address@hidden
   >    From: Camm Maguire <address@hidden>
   >    Date: 17 Dec 2003 17:11:44 -0500
   >    User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2
   >    Content-Type: text/plain; charset=us-ascii
   >    Thanks Matt!  I got it now.  I wonder if you had also saved the gbc
   >    output -- if I recall you usually run with *notify-gbc* turned on.
   >    Also, a (room) before and after would be of interest, as well as the
   >    effect of (si:sgc-on nil).
   >    On the one hand, GBC is clearly the performance culprit, and that is
   >    very good news, as it is relatively easy to address with a better
   >    memory layout.  What is odd about your profiling though is that
   >    marking is taking up the lion's share of the time.  In normal GBC, its
   >    the sweeper that costs, at least in the tests I've run so far.  And I
   >    think I know why, it is the following extra pass that's required with
   >    sgc marking:
   >      /* mark all non recent data on writable pages */
   >      {
   >        int t,i=page(heap_end);
   >        struct typemanager *tm;
   >        char *p;
   >        while (--i >= 0) {
   >     if (WRITABLE_PAGE_P(i)
   >         && (t=type_map[i]) < (int) t_end)
   >       ;
   >     else 
   >       continue;
   >     tm=tm_of(t);
   >     p=pagetochar(i);
   >     if ( t == t_cons) 
   >       for (j = tm->tm_nppage; --j >= 0; p += tm_table[t_cons].tm_size/*  
sizeof(struct cons) */) {
   >         object x = (object) p; 
   >         if (SGC_OR_M(x)) 
   >           continue;
   >         if (x->d.t==t_cons) {
   >           x->d.m = TRUE; 
   >           sgc_mark_cons(x);
   >         } else
   >           sgc_mark_object1(x);
   >       }
   >     else {
   >       int size=tm->tm_size;
   >       for (j = tm->tm_nppage; --j >= 0; p += size) {
   >         object x = (object) p; 
   >         if (SGC_OR_M(x)) continue;
   >         sgc_mark_object1(x);
   >       }
   >     }
   >        }
   >      }
   >    I'll have to think a bit about why this was put in, but in short, sgc
   >    need not be a performance boost if most of your memory is writable
   >    anyway, and might (speculation) even be less efficient.  This pertains
   >    to that note I sent earlier about the enabling of SGC in ACL2, which
   >    you asked me to clarify, and I never did, as I don't yet have a good
   >    answer as to a metric to determine when SGC helps and when it does
   >    not.  One thing should be clear, and that is many heap expansions,
   >    which require an (si::sgc-on nil)(gbc t)(si::sgc-on t), are very
   >    expensive, as these steps require that the read-only subset of memory
   >    be determined all over again.
   >    To be a bit more verbose, SGC works by taking the memory set at time
   >    of (si:sgc-on t), making it read-only, and allocating new writable
   >    pages for use in subsequent allocation -- these are "SGC" pages,
   >    i.e. the memory set is divided into two.  Write attempts on the old
   >    memory make those pages writable, but they are not "SGC" pages as far
   >    as the algorithm goes, and must be garbage collected separately.  The
   >    idea is to enable sgc at the point where the most memory that will
   >    essentially have no further modifications can be set aside.  Setting
   >    aside a small amount of truly read-only memory, or a lot of memory
   >    which has significant subsequent modifications, both suffer from
   >    inefficiencies.
   >    >From looking at b.out, for example, half the time is spend in
   >    sgc_mark_phase itself, and half in its main child process,
   >    sgc_mark_object1.  This suggests to me that the ostensibly read-only
   >    memory which was actually modified and thus made writable (writable
   >    non-SGC pages) is substantial.  The loop shown above is similar to
   >    that in the sweeper which is the main cost in normal GBC, so the more
   >    memory it has to process, the worse performance will be.
   >    Anyway, these are just initial thoughts.  I'd like Vadim's input if
   >    possible to construct for this case an analog of the maxima init file
   >    he's just put together, so we can get a performance comparison with
   >    ACL short of GBC.  This in turn will help focus our efforts on where
   >    our out of the box algorithms and parameter settings can be improved. 
   >    Take care, 
   >    Matt Kaufmann <address@hidden> writes:
   >    > Camm --
   >    > 
   >    > In case email from AMD to you is problematic, I've also just now put 
   >    > profile results on the web as a gzipped tar file:
   >    > 
   >    > /u/www/users/moore/acl2/seminar/temp/camm.tar.gz
   >    > or
   >    > http://www.cs.utexas.edu/users/moore/acl2/seminar/temp/camm.tar.gz
   >    > 
   >    > When you let me know you've received it, I'll delete this file.
   >    > 
   >    > -- Matt
   >    > 
   >    > 
   >    > 
   >    -- 
   >    Camm Maguire                                            address@hidden
   >    "The earth is but one country, and mankind its citizens."  --  
   > _______________________________________________
   > Gcl-devel mailing list
   > address@hidden
   > http://mail.gnu.org/mailman/listinfo/gcl-devel

   Camm Maguire                                         address@hidden
   "The earth is but one country, and mankind its citizens."  --  Baha'u'llah

reply via email to

[Prev in Thread] Current Thread [Next in Thread]