Size and length limits for Emacs primitive types and etc data?

From: Oleksandr Gavenko
Subject: Size and length limits for Emacs primitive types and etc data?
Date: Wed, 23 Jan 2013 00:06:04 +0200
during search I found these sources of information about limits of Emacs 

  (info "(elisp)Programming Types")
                Programming Types
                 Re: stack overflow limit
                 The value of re_max_failures we use now needs 4MB of stack on
                 a 32-but machine, twice as much on a 64-bit machine. We also
                 need stack space for GC.

>From official docs:

For integers: 28bit + sign.
For chars: 22-bit.

Next types have unknown or undefined size limits in manual but:


For float: Emacs uses the IEEE floating point standard where possible. But
which precision exactly (half/single/double

/* Lisp floating point type.  */
struct Lisp_Float  /* src/lisp.h */
      double data;
      struct Lisp_Float *chain;
    } u;

Seems it uses 64-bit (double precision) IEEE 754 on most of 32-bit platforms.

Any function in runtime that return digits and exponent width for float?


For list: I think their length unlimited at all.


But how many bytes take symbol? For example 'foo'?

>From src/lisp.h:

typedef struct { EMACS_INT i; } Lisp_Object;

struct Lisp_Symbol
  unsigned gcmarkbit : 1;
  ENUM_BF (symbol_redirect) redirect : 3;
  unsigned constant : 2;
  unsigned interned : 2;
  unsigned declared_special : 1;
  Lisp_Object name;
  union {
    Lisp_Object value;
    struct Lisp_Symbol *alias;
    struct Lisp_Buffer_Local_Value *blv;
    union Lisp_Fwd *fwd;
  } val;
  Lisp_Object function;
  Lisp_Object plist;
  struct Lisp_Symbol *next;

For 32-bit arch I count 4*6=24 bytes.

Seems that Lisp_Object is index in hash table to actual values (like actual
name or function code...).


How many memory takes cons cell?

struct Lisp_Cons
    Lisp_Object car;
      Lisp_Object cdr;
      struct Lisp_Cons *chain;
    } u;

For 32-bit arch I count 4*2=8 bytes.


How many takes plist for storing single property?


DEFUN ("plist-put", Fplist_put, Splist_put, 3, 3, 0,
  (Lisp_Object plist, register Lisp_Object prop, Lisp_Object val)
  register Lisp_Object tail, prev;
  Lisp_Object newcell;
  prev = Qnil;
  for (tail = plist; CONSP (tail) && CONSP (XCDR (tail));
       tail = XCDR (XCDR (tail)))

seems that 2 cons... or 8*2=16 bytes.


How many memory takes string (which is buffer strings and symbols names)?

typedef struct interval *INTERVAL;
struct Lisp_String
    ptrdiff_t size;
    ptrdiff_t size_byte;
    INTERVAL intervals;         /* Text properties in this string.  */
    unsigned char *data;

Seems that 3*4 + lengthOf(data) bytes.

Manual say that "strings really contain integers" and "strings are arrays, and
therefore sequences as well".

So each char (in data) uses 4 bytes? Seem doesn't. As

     To conserve memory, Emacs does not hold fixed-length 22-bit numbers that
  are codepoints of text characters within buffers and strings. Rather, Emacs
  uses a variable-length internal representation of characters, that stores
  each character as a sequence of 1 to 5 8-bit bytes, depending on the
  magnitude of its codepoint.


  Encoded text is not really text, as far as Emacs is concerned, but rather a
  sequence of raw 8-bit bytes. We call buffers and strings that hold encoded
  text "unibyte" buffers and strings, because Emacs treats them as a sequence
  of individual bytes.

With unibyte I understand that it is easy to get char by index.

But with multibyte I don't understand. And don't understand why in this case
string are array, is it an inefficient array?

Seems that buffer text == string:

struct buffer_text   /* from src/buffer.h */
    unsigned char *beg;
    ptrdiff_t gpt;              /* Char pos of gap in buffer.  */
    ptrdiff_t z;                /* Char pos of end of buffer.  */
    ptrdiff_t gpt_byte;         /* Byte pos of gap in buffer.  */
    ptrdiff_t z_byte;           /* Byte pos of end of buffer.  */
    ptrdiff_t gap_size;         /* Size of buffer's gap.  */
    EMACS_INT modiff;           /* This counts buffer-modification events
    EMACS_INT chars_modiff;     /* This is modified with character change
    EMACS_INT save_modiff;      /* Previous value of modiff, as of last
    EMACS_INT overlay_modiff;   /* Counts modifications to overlays.  */
    EMACS_INT compact;          /* Set to modiff each time when compact_buffer
    ptrdiff_t beg_unchanged;
    ptrdiff_t end_unchanged;
    EMACS_INT unchanged_modified;
    EMACS_INT overlay_unchanged_modified;
    INTERVAL intervals;
    struct Lisp_Marker *markers;
    bool inhibit_shrinking;

So opening 10 KiB Russian file in cp1251 actually take 2*10 KiB for buffer as
each Russian chars in multibyte string take 2 bytes... (just type C-u C-x =
and look to "buffer code: #xD0 #x91").

I think that string have no length limit (except limit in 28-bit for index on
32-bit platform).


Seems that arrays/vectors also have no limits for length (except limit in
28-bit for index on 32-bit platform):

/* Regular vector is just a header plus array of Lisp_Objects.  */
struct Lisp_Vector   /* src/lisp.h */
    struct vectorlike_header header;
    Lisp_Object contents[1];

/* A boolvector is a kind of vectorlike, with contents are like a string.  */
struct Lisp_Bool_Vector
    struct vectorlike_header header;
    /* This is the size in bits.  */
    EMACS_INT size;
    /* This contains the actual bits, packed into bytes.  */
    unsigned char data[1];


Hash tables are harder data type and I don't understand limitations on count
of key-values pairs from:

struct Lisp_Hash_Table
  struct vectorlike_header header;
  Lisp_Object weak;
  Lisp_Object rehash_size;
  Lisp_Object rehash_threshold;
  Lisp_Object hash;
  Lisp_Object next;
  Lisp_Object next_free;
  Lisp_Object index;
  ptrdiff_t count;
  Lisp_Object key_and_value;
  struct hash_table_test test;
  struct Lisp_Hash_Table *next_weak;


Please correct me and answer the questions...

Best regards!

