emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Objects layout and tagging scheme


From: Dmitry Antipov
Subject: Re: Objects layout and tagging scheme
Date: Fri, 03 Aug 2012 12:17:57 +0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20120713 Thunderbird/14.0

On 08/02/2012 10:09 PM, Paul Eggert wrote:

For strings, 3 bits are free in the pointers to intervals,
if we can assume intervals are aligned like other lisp
objects, which should be possible to arrange.

For vectors the same trick could be played, with next.buffer
and next.vector.   Presumably we can think of a similar way
to do it with next.nbytes, since nbytes is limited.

The more I do different things for C part of Emacs, the more I hate such
a bit tricks. IMHO they're much more obfuscating than all of the xVAR stuff.
Even worse, packing every possible unused bit turns further extensions into
a nightmare. For example, I can follow your suggestions and hack 2 bits
into free bits of pointers (and add more ugly stuff to Lisp_Cons); next,
someone will ask for 1 more bit (for tricolor marking - why not?), and
next round of obfuscation will start again.

That's why I'm thinking about per-object unified headers. Consider the
following layout: if LSB (or MSB) of Lisp_Object is non-zero, the rest
bits represents signed integer; otherwise, the rest bits represents the
pointer to heap object. Each object has 4-byte header. In the header,
mark bit, extra gc information and type information are always the same bits
for all objects; the rest of the header is object-specific or unused.
For example, cons header may be

struct cons_header {
  unsigned type : 6;     /* Lisp_Cons */
  unsigned gcmark : 1:
  unsigned gcinfo : 2;
  unsigned unused : 23;
};

Symbol header may be:

struct symbol_header {
  unsigned type : 6;    /* Lisp_Symbol */
  unsigned gcmark : 1;
  unsigned gcinfo : 2;
  unsigned redirect : 3;
  unsigned constant : 2:
  unsigned interned : 2;
  unsigned declared_special : 1;
  unsigned unused : 15;
};

etc. The only disadvantage is an increased memory consumption (Lisp_Cons is
a great loser here, plus pure objects which doesn't need gcXXX bits). But,
at the cost of this, we can have at least;

- No USE_LSB_TAG hacks - it's pretty enough to be sure that all heap objects
  are aligned to word boundary;
- No address space limitation, welcome mmap;
- Native limitation for vectors and strings length (size is, really, size,
  without ARRAY_MARK_FLAG, PSEUDOVECTOR_FLAG and so);
- No separate bitmaps for conses and floats, so, no alignment limitations
  for cons and float blocks - say goodbye to lisp_align_malloc;
- faster mark and check whether the mark is here already - no more
  switch (XTYPE (obj)) because all type bits are identically placed for all;
- simple type system without second-class citizens like current misc family.

I'm not sure that this layout may co-exists with the current one, so it's
a subject for development in the branch; when it will be done, we will
have a solid base for further GC improvements.

Dmitry




reply via email to

[Prev in Thread] Current Thread [Next in Thread]