guile-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Guile-commits] GNU Guile branch, nan-boxing, updated. v2.1.0-28-gb3909e


From: Andy Wingo
Subject: [Guile-commits] GNU Guile branch, nan-boxing, updated. v2.1.0-28-gb3909e8
Date: Wed, 25 May 2011 08:56:44 +0000

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU Guile".

http://git.savannah.gnu.org/cgit/guile.git/commit/?id=b3909e8f00e95f64f2b6cb40848cb9cf967491f4

The branch, nan-boxing has been updated
       via  b3909e8f00e95f64f2b6cb40848cb9cf967491f4 (commit)
      from  c2f56e4d898b834e16d668b3d0aedfcca728a909 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit b3909e8f00e95f64f2b6cb40848cb9cf967491f4
Author: Andy Wingo <address@hidden>
Date:   Sun May 15 10:26:15 2011 +0200

    tmp

-----------------------------------------------------------------------

Summary of changes:
 libguile/tags.h |  125 ++++++++++++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 110 insertions(+), 15 deletions(-)

diff --git a/libguile/tags.h b/libguile/tags.h
index f500902..83ed293 100644
--- a/libguile/tags.h
+++ b/libguile/tags.h
@@ -104,32 +104,127 @@ typedef union SCM SCM;
 
 /* Representation of scheme objects:
  *
- * Guile's type system is designed to work on systems where scm_t_bits and SCM
- * variables consist of at least 32 bits.  The objects that a SCM variable can
+ * Guile's type system is designed to work on systems where scm_t_bits
+ * and SCM values consist of 64 bits.  The objects that a SCM value can
  * represent belong to one of the following two major categories:
  *
- * - Immediates -- meaning that the SCM variable contains an entire Scheme
- *   object.  That means, all the object's data (including the type tagging
- *   information that is required to identify the object's type) must fit into
- *   32 bits.
+ * - Immediates -- meaning that the SCM value contains an entire Scheme
+ *   object.  That means, all the object's data (including the type
+ *   tagging information that is required to identify the object's type)
+ *   must fit into 64 bits.
  *
- * - Non-immediates -- meaning that the SCM variable holds a pointer into the
- *   heap of cells (see below).  On systems where a pointer needs more than 32
- *   bits this means that scm_t_bits and SCM variables need to be large enough
- *   to hold such pointers.  In contrast to immediates, the object's data of
- *   a non-immediate can consume arbitrary amounts of memory: The heap cell
- *   being pointed to consists of at least two scm_t_bits variables and thus
- *   can be used to hold pointers to malloc'ed memory of any size.
+ * - Non-immediates -- meaning that the SCM value contains a pointer
+ *   into the heap, where additional information is stored.
  *
  * The 'heap' is the memory area that is under control of Guile's
  * garbage collector.  The size of the pointed-to memory will be at
- * least 8 bytes, and its address will be 8-byte aligned.
- *
- * This eight-byte alignment means that for a non-immediate -- a heap
+ * least 8 bytes, and its address will be 8-byte aligned.  Thus, on a
+ * 32-bit system, a pointer only has 29 significant bits.  On a 64-bit
+ * system, Guile restricts the address range of the heap to the lower 48
+ * bits of memory, so a pointer has 45 significant bits.
+ *
+ * Guile uses the remaining 19 bits of a SCM value to encode whether the
+ * object is immediate or non-immediate, and, if it is immediate, to
+ * encode the payload.  For example, in a Scheme character, some of the
+ * bits of the SCM indicate that the value is a character, and the other
+ * bits indicate which character it is.
+ *
+ * The precise encoding of an SCM uses a technique known as
+ * "NaN-boxing".  The basic idea is that of the (2^53 - 2) possible bit
+ * patterns for a double-precision IEEE-754 floating point NaN
+ * (not-a-number), current hardware and software only produces one
+ * representation.  Guile takes advantage of this situation to encode
+ * values in the remaining NaN representations.
+ *
+ * The primary advantage of this scheme is that floating-point numbers
+ * can be represented as immediate values.  
+ *
+ * Recall the IEEE-754 double representation:
+ *
+ *   <- most significant bit (MSB)       least significant bit (LSB) ->
+ *
+ *   sign: 1 bit
+ *   | exponent: 11 bits
+ *   | |           mantissa: 52 bits
+ *   ------------------------------------------------------------------
+ *   0 00000000000 0000000000000000000000000000000000000000000000000000
+ * #x0    0   0    0   0   0   0   0   0   0   0   0   0   0   0   0
+ * 
+ * Positive and negative infinity are encoded like this:
+ * 
+ *   +inf.0
+ *   0 11111111111 0000000000000000000000000000000000000000000000000000
+ * #x7    F   F    0   0   0   0   0   0   0   0   0   0   0   0   0
+ *
+ *   -inf.0
+ *   1 11111111111 0000000000000000000000000000000000000000000000000000
+ * #xF    F   F    0   0   0   0   0   0   0   0   0   0   0   0   0
+ *
+ * And the canonical NaN value is like this:
+ *
+ *   +nan.0
+ *   0 11111111111 100000000000000000000000000000000000000000000000000
+ * #x7    F   F    8   0   0   0   0   0   0   0   0   0   0   0   0
+ *
+ * Any other bit pattern with all exponent bits set is a non-canonical
+ * NaN.  For simplicity, Guile uses NaN values with the top 13 bits set,
+ * then uses the next 3 bits as a tag, and the following 48 bits as a
+ * payload.
+ *
+ *   1 11111111111 1TTTPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP
+ *
+ * Some tag bits will indicate immediate values, some will indicate
+ * non-immediate values, and some will indicate invalid values.  If any
+ * of the top 13 bits is unset, then the value is a double.
+ *
+ * At this point we need to talk about garbage collection.  There are
+ * two cases to consider: 32-bit pointers and 64-bit pointers.  They
+ * present different challenges.
+ *
+ * In the 32-bit case, the low 32 bits of the payload may represent a
+ * pointer directly, and the entire upper 32 bits may be taken as the
+ * tag.  The GC will see the low half of the SCM as a potential pointer,
+ * and can trace it.  However we need to take care that non-pointer bit
+ * patterns 
+ *
+ * Guile does one more trick here, which is to rotate the whole tag
+ * space, subtracting off 0xfff800000000000 from the bit
+ * representation.  This will leave the top 13 bits set to zero, if the
+ * SCM value is not a double
+At this point you have a choice: whether to prefer doubles or pointers
+ * representation or the 
+And then subtract off a constant from the tag, so that for pointers,
+ * the first bits can be zero:
+ * 
+ *   000000000000000000PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP
+ *
+ * Then, if the tag indicates that the value is a 
+ *
+ * The eight-byte alignment means that for a non-immediate -- a heap
  * object -- that the three least significant bits of its address will
  * be zero.  Guile can then use these bits as type tags.  These lowest
  * three bits are called tc3-bits, where tc stands for type-code.
  *
+ *
+
+  The heap is the part of memory that is managed by
+ *   the garbage collector.  Guile restricts the heap to a 48-bit
+ *   address range, so this means that some bit representations of SCM
+ *   values encode a 48-bit pointer to the heap.
+ *
+ *   Most non-immediates use the first word of the heap memory pointed
+ *   to be a non-immediate for extra type information.  The notable
+ *   exception to this scheme are pairs, which has space for at least one 
scm_t_bits value,
+ *   which Guile uses to SCM value pointing to the heap
+ *
+On systems where a pointer needs more than
+ *   32 bits this means that scm_t_bits and SCM variables need to be
+ *   large enough to hold such pointers.  In contrast to immediates, the
+ *   object's data of a non-immediate can consume arbitrary amounts of
+ *   memory: The heap cell being pointed to consists of at least two
+ *   scm_t_bits variables and thus can be used to hold pointers to
+ *   malloc'ed memory of any size.
+ *
  * For a given SCM value, the distinction whether it holds an immediate
  * or non-immediate object is based on the tc3-bits of its scm_t_bits
  * equivalent: If the tc3-bits equal #b000, then the SCM value holds a


hooks/post-receive
-- 
GNU Guile



reply via email to

[Prev in Thread] Current Thread [Next in Thread]