[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Texmacs-dev] Using STL

From: Joris van der Hoeven
Subject: Re: [Texmacs-dev] Using STL
Date: Wed, 15 May 2002 18:37:45 +0200 (MET DST)

On Wed, 15 May 2002, David Allouche wrote:

> On Tuesday 14 May 2002 19:13, Joris van der Hoeven wrote:
> > On Tue, 14 May 2002, David Allouche wrote:
> > >
> > > As I said previously, I need a sorted associative container, and
> > > there is no such thing in your classes.
> > >
> > > A sorted associative container is an associative container like your
> > > hashmap, but additionnaly it provides an ordering based on a
> > > comparison on the key. I need this to implement VALUE assignations in
> > > TMSL efficiently.
> >
> > Can you explain this a little bit more? I am willing to implement this,
> > but I would like to know what is the exact specification and
> > what you would like to do with it.
> I am thinking of imlementing the TMSL enviroment as a linked list of 
> tmsl_env objects. Each object can be either a VAL environment (the top 
> level, and environments created by WITH and APPLY) or a ARG environment 
> (created by EXPAND).

This is probably a *bad* idea (I am sorry to be harsh).
In fact, I did things in this way at the beginning (!),
but I noticed a significant acceleration of the typesetting speed
when I moved to the present solution, which is algorithmically
very nice (it uses a formalization of "patches" and some
operations on them).

You may still see how I implemented
the old solution in Basic/Types/rel_hashmap.*
The main problem is that accessing a variable takes
a time which is proportional to the linked list.
This is already a problem for our simple environments and
will become a bigger problem when environments get
more structured through nesting.

> ASSIGN will affect the innermost environment defining the name being 
> assigned to.
> To implement assignation efficiently we need a notification system. I 
> define an interface with an abstract method notify_assign (I might use a 
> concrete interface later, but at the moment I consider it abstract). When 
> a tmsl_rewriter depends on a val variable, it invokes the method
> ttree tmsl_env::get_val_attach(string, val_observer*, path ip) of its 
> environment with 'this' as a val_observer (we need an uncounted pointer 
> here), and the inverse path of the rewriter in the rewriter structure as 
> ip.
> When a val variable is ASSIGN'd, the containing environment sends a 
> notify assign to all relevant attached val_observers. The relevant 
> attached observers are those which:
>   - are attached to the same name in the same tmsl_env
>   - are after the ASSIGN
>   - are before the next ASSIGN
> After and before are defined by postorder tree traversal to account for 
> constructs of the type (assign "x" (+ "x" "1")).
> ASSIGN location are defined as the inverse path of the 
> tmsl_assign_rewriter in the rewriter structure.
> To implement that notification system, I define a tmsl_val_observer_group 
> class, which records for a given value name in a given tmsl_environment 
> the attached val_observer's, their locations and the locations of the 
> ASSIGN statements (also as inverse paths in the rewriter hierarchy).
> There I need sorted containers to get:
>   - the location of the following ASSIGN
>   - all the tmsl_val_observer between the modified ASSIGN and the next one
> ASSIGN locations can be stored in a simple sorted set, but observers must 
> be stored in a sorted associative container whose key is the observer's 
> inverse path.
> Since rewriters are recomputed in postorder (by recursive descent) there 
> is no notification at initial rewriting. Partial rewritings after 
> document modification may be optimized by using assign_notify to mark 
> rewriters are dirty and use commit to reprocess dirty rewriters.

I was rather thinking about another scheme, similar to the one used
in "bridge", but better optimized:

1. To each node of tmsl_rewriter (the source tree) you associate

   a) A patch which contains the *inverse* of the environment change
      corresponding to rewriting the source tree.

   b) A hashset which contains all environment variables
      on which the rewriting depends (this has not been done in
      the bridge structure, but it will optimize rewriting a lot).

2. This structure is well compatible with the tree-structure for
   most primitives. For instance for a concatenation,
   the inverse patch for the concatenation is the "simplified"
   inverse composite of the patches of its children.
   The environment variables on which the rewriting depends
   is the union of those on which the children depend.

   In other words, when making a local modification,
   you will only need to descend in the tree-structure
   at those places where a real change occurred.
   This is linear in time.

Please compare the two approaches thoroughly and
we will rediscuss this issue.

> > > > I also would like you to avoid using too many new templated classes
> > > > in your program, because this hugely increases the size of the
> > > > executable. So please use the existing classes, like tree,
> > > > array<int>, hashmap<string,tree> etc. whenever possible.
> > >
> > > I do, but I am not willing to write a new one if the STL already has
> > > what I need.
> >
> > So I will take care of that.
> I do not see what benefit you can expect from using a roll-it-yourself 
> implementation. It will not significantly reduce the code size, it may 
> introduce new bugs (any amount of new code may introduce new bugs), and 
> will most likely not perform better than STL code.
> But if you insist on it, and do not ask me to write or debug it, ok.

Yes, I insist and will do it, if necessary of course...

> > OK. Please remind my remark on using NEW_TREES + tree for ttree.
> > This will allow you to concentrate of the really essential stuff:
> > the rewriters.
> I have already written ttree. But If you want, I may merge the code with 
> tree.

Good, can you post your code to texmacs-dev?

> By the way, I have one important question about ttrees:
> should the value of the inverse path used in hash equality comparisons 
> and hash computation?

I think not: we do not want to let == depend on the ip,
so hashing should not depend on it either.
Also, the ip field may change during editing,
which makes it inappropriate for hashing.

> We might need to do so to use hashmaps to implement a rewriting cache. I 
> have not really thought that issue out at the moment, though. 
> I have already figured out that we must not consider the shared tail 
> structure; though I use it to reduce the quadratic(depth) complexity 
> induced by inverse path traversal to linear(depth)-best-case and 
> quadratic(depth)-worse-case.

I do not see why we should not use it;
it is very efficient, even though it is a bit dirty.
But if we implement it well, then the dirtyness should
be completely located in a well determined part of
the source code (a bit like reference counting).


reply via email to

[Prev in Thread] Current Thread [Next in Thread]