guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

redoing SCM representation in 2.2


From: Andy Wingo
Subject: redoing SCM representation in 2.2
Date: Thu, 12 May 2011 12:17:56 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux)

Hello list,

I'm looking at new SCM representation and tagging possibilities in 2.2.
Read the whole mail please, as it's a little complicated.

The current system has a couple of problems:

  1) You can only tell a pair by dereferencing the pointer and checking
     the cell's low bits, and then we need complicated tricks to set the
     pair's bits on the value that is in the car.

     (This strategy was necessary with the old GC, but is not needed
     with the BDW GC.)

  2) You can only tell a procedure by dereferencing the pointer and
     checking the cell's word.

  3) Doubles are allocated on the heap.  This is, as my grandmother
     would say, turrible.

I would like to revisit the SCM representation and tagging scheme in
2.2.  In particular, I would like to look at NaN-boxing.  I explain the
plan a bit below, but if you like to get your information depth-first,
check out:

  
http://blog.mozilla.com/rob-sayre/2010/08/02/mozillas-new-javascript-value-representation/

The concrete implementation strategy that I would like to use is
JSValue, from WebKit's JavaScriptCore (JSC):

  http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/runtime/JSValue.h
  
http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/runtime/JSValueInlineMethods.h

(Almost makes you want C++, doesn't it? :)

                              *   *   *

Recall, a double is laid out like this:

  <- msb                                                      lsb ->

  sign: 1 bit
  | exponent: 11 bits
  | |           mantissa: 52 bits
  ------------------------------------------------------------------
  0 00000000000 0000000000000000000000000000000000000000000000000000

This particular value represents 0.0.  The special values +inf.0 and
-inf.0 are, respectively:

  0 11111111111 0000000000000000000000000000000000000000000000000000
  1 11111111111 0000000000000000000000000000000000000000000000000000

Any other bit pattern with all bits set in the exponent is a NaN.  But
in practice, hardware and software will only generate one particular NaN
value:

  > (number->string (bytevector-u64-native-ref #f64(+nan.0) 0) 2)
  $8 = "111111111111000000000000000000000000000000000000000000000000000"

Which is:

  0 11111111111 100000000000000000000000000000000000000000000000000

So that in practice, we have 51 bits into which we can stuff data, if we
set the MSB of a NaN.  If we assume a 48-bit payload, then:

  1 11111111111 1XXX_______________________________________________

So 8 tags there.

We also have any other combination with the NaN prefix that sets any
bit.  If we assume a 48-bit payload, that gives us

  0 11111111111 XXXX_______________________________________________

with the values 1000 and 0000 reserved, so 14 usable tags there.

                           *   *   *

Now, if you're like me, you probably have two big questions that end in
exclamation marks:

 1) 64 bits?  That's wasting too much memory on 32-bit systems!

 2) 48 bits?  That's not enough for a pointer on 64-bit systems!

With regards to (1), I can only agree.  However, as it gives us more
tags, we can represent more things as immediates (besides doubles,
foreign pointers would be one example), and we can shave off the initial
word from some other structures, so perhaps that's not all bad.

Additionally, we can limit the payload to 32-bits on 32-bit systems, so
that a type check is a simple 32-bit load, and payload extraction is
also a 32-bit load, so that's probably faster than the current thing.
(This is the JSVALUE32_64 case in JSValue, linked below.  It has been
found that comparing 8-bit tags actually takes more time than comparing
32-bit tags; see https://bugzilla.mozilla.org/show_bug.cgi?id=549143 for
the gory details).

Regarding (2), may I commend the following diagram to your perusal:

  http://en.wikipedia.org/wiki/X86-64#Virtual_address_space_details

Basically, current x86-64 chips restrict addresses to a 48-bit space.
Other architectures are not so restrictive:

  http://developers.sun.com/solaris/articles/solaris_memory.html

Note that on x86-64, Solaris will give user addresses in the
0xffff... range, unlike Linux or Windows, which only allocate the bottom
47 bits to userspace.  On Sparc64, there is no architecture-imposed VMA
hole as there is in x86-64.

Basically I think it's OK to restrict the Scheme heap to be within a
48-bit space, at least for the next decade or so.  But given that the
total address space is more than 48 bits on many architectures,
arbitrary immediate foreign pointers may not be possible on
e.g. Sparc64.

See also:

  https://bugzilla.mozilla.org/show_bug.cgi?id=577056


                           *   *   *

The difference between JSC's "nan-boxing" and spidermonkey (SM, the
thing from mozilla)'s "nun-boxing" is that JSC prefers the pointer
representation, and SM prefers the double representation.

So for the value 0.0, SM has

  0 00000000000 0000000000000000000000000000000000000000000000000000

And for a pointer to 0x0, SM has

  1 11111111111 1000000000000000000000000000000000000000000000000000

or something like that.  JSC on the other hand rotates the double space,
so that a pointer to 0x0 is

  0 00000000000 0000000000000000000000000000000000000000000000000000

and the double for 0.0 is something more like:

  1 10000000000 1000000000000000000000000000000000000000000000000000

I think we need to do the JSC way, as it appears to be the only way to
work with the BDW GC, currently anyway.  We will need some integration
with the GC to ensure the 48-bit space, but that should be doable.

With this encoding we might also be able to allocate an all-bits-set tag
to fixnums so that we can use the native overflow flags when doing
fxops.

Note that for pointers to the heap, we also still have the lower three
bits to play with.

                           *   *   *

The plan is basically to give it a go and see what happens, in a
branch.  But I'm putting this mail out there so that our architectural
gurus can give it a look-over and point out any portability issues this
strategy might present.

Thanks for reading, and looking forward to the feedback!

Andy
-- 
http://wingolog.org/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]