texmacs-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Texmacs-dev] Cache profiling of TeXmacs 1.0.3.9


From: David Allouche
Subject: Re: [Texmacs-dev] Cache profiling of TeXmacs 1.0.3.9
Date: Sat, 22 May 2004 09:43:32 +0200
User-agent: Mutt/1.5.5.1+cvs20040105i

On Fri, May 21, 2004 at 10:47:05AM +0200, Joris van der Hoeven wrote:
> 
> >   1. Guile is crap, but we already knew that.
> >      But TeXmacs does not help it much either. When working with the
> >      hide-show plugin, which uses big trees, I have discovered that the
> >      fact that that the size of tree objects which is reported to the
> >      garbage collector is wrong (only the root node, not the whole tree)
> >      can cause guile to spend all its time collecting, bringing TeXmacs
> >      to a halt. I believe that happens because it does run out of
> >      memory, but it does not increase the heap size appropriately since
> >      it believe it manages much less memory than it does actually.
> 
> This seems to be a bug in Guile. I followed the Guile instructions
> for reporting the size of trees (Guile "smobs").

You know, what I with no longer being a paid texmacs dev is that I can
tell you up front and publicly when you are being an idiot (more
colorful language to follow, if symptoms persist). It will definitely
help me feel better.

Even your own code disagree with what you are saying. The comment in the
code says: "should be replaced by total size of the tree".

> >   2. TeXmacs has many cache misses, but they seem to happen all over the
> >      code. So, the issue are either general design problems, or are
> >      faults in commonly used data structures.
> 
> I do not understand the meaning of "many" here; I am not an expert on
> cache issues; maybe someone can detail. From the statistics I got
> the idea that there is approximately one cache miss for 1000 accesses
> in TeXmacs (and 1/100 for Guile).

Good point. In fact all this message was largely snake oil. David seems
willing to work on optimization, so I tell him when I think one can gain
on performance. Neither you nor I really understand what that data
really mean.

> >      Here, my usual suspects are needless indirections. They might make
> >      not a big difference. But that's something which should be tried.
> 
> Apart from the tree/array issue you mentioned, I do not see many
> other points. I rather have the impression that it is the global
> design of random memory access which is responsable for most
> cache misses. We would need to compactify memory in order to
> reduce that.

Compacting memory is difficult, and can lead to many problems, and even
more indirection, etc... risky way to go.

The needless array indirection is not a big problem to fix, and actually
they would not have happened in the first time if texmacs had been
written in C++ and not in a "scheme-c++ lookalike thing while keeping an
eye on mathemagix" of your own. Use the dialects of the programming
language, or be doomed to eternal suffering.

Rather keep the things simple, try fixing obvious and simple problem,
and see what that yields.


> >   3. The top faulters are counted pointer constructors and destructors
> >      (not real objects ctors and dtors, just pesky counted pointers)
> >      and the memory allocator.
> 
> That might be; we get back to the const T& point.

Exactly.

> 
> >      Regarding the allocator, there is probably a lot of good
> >      bibliography on that very non-trivial issue. The TeXmacs allocator
> >      is reasonable, but it seems naive. Using pools for related objects
> >      might help, but that may be difficult to implement and the result
> >      are uncertain.
> 
> In that case, I would rather opt for a copying compactifier, which can be
> launched in spare time. But I became a bit discouraged by your experiences
> with GC to dig further into this. Nevertheless, I do think that a better
> memory allocation scheme is possible. But it will probably be time
> consuming to implement.

Pooled allocation seems a reasonable and not too complex way to go. Much
less complex than some memory compactifier research project.

> >   4. Another remarkable faulter is the string comparison.
> >
> >      The names of all typesetter variable should be interned, and
> >      string objects could be either interned or self-contained. That
> >      will add a boolen to string_rep, but that's probably worth the
> >      cost. To make the best use of cache loads, interned strings should
> >      be stored in a separate pool, whose size can be known
> >      at compilation time.
> 
> Yes, this should certainly be considered sometime. In fact,
> we might use symbols instead of strings as the labels of all trees.
> That would make compairison much faster, even though usual string
> operations like << on labels would become slower. Moreover,
> the actual strings might be stored in a separate pool which is
> compactified at spare time.

Not saying that.

Saying that typesetter variable names (which are not labels of trees,
because they are used in ASSIGN and WITH nodes, in addition to VALUE
nodes) should be interned, and that string nodes of tree should be
tested against interned names at tree creation.

Document string nodes are much smaller a problem. I do not understand
what you have with memory compaction. That a really hard problem, and
it's going to be a long time before you are sure that no code in texmacs
relies on fixed object addresses.

But, hey, it's your research project. If that suits your fancy, you can
stick it up your nose.

-- 
                                                            -- ddaa

David Allouche         | GNU TeXmacs -- Writing is a pleasure
Free software engineer |    http://www.texmacs.org
   http://ddaa.net     |    http://savannah.gnu.org/projects/texmacs
   address@hidden  |    address@hidden
TeXmacs is NOT a LaTeX front-end and is unrelated to emacs.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]