emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using empty_string as the only "" string


From: Dmitry Antipov
Subject: Re: Using empty_string as the only "" string
Date: Thu, 26 Apr 2007 18:24:01 +0400
User-agent: Thunderbird 1.5.0.7 (X11/20061008)

Stefan Monnier wrote:

PS: But if you're interested in such small optimizations, I have another one
in my local Emacs where the Lisp_String data type is changed to:

   struct Lisp_String
     {
       EMACS_INT size;
       EMACS_INT size_byte : BITS_PER_EMACS_INT - 1;
       unsigned inlined : 1;    /* 0 -> ptr, 1 -> chars; in union below.  */
       INTERVAL intervals;              /* text properties in this string */
       union
       {
         unsigned char *ptr;
         unsigned char chars[STRING_MAXINLINE];
       } data;
     };

this way, on 32bit systems, strings of up to 3 bytes can be represented with
just a Lisp_String without any `sdata'.  On 64bit systems, this can be used
for strings up to 7 bytes long (i.e. almost 50% of all allocated strings,
IIRC).  And it can also be used for all the strings in the pure space (no
matter how long), so it saves about 50KB of pure space (can't remember the
exact number, but IIRC it was more than 10KB and less than 100KB).

I'm interesting in _any_ optimization. Here is a brain-damaged :-) Lisp_String
I'm thinking about:

#define STRING_IMMEDIATE_SIZE (sizeof (EMACS_INT) * 3 - 2)

struct Lisp_String
  {
    union
    {
      /* Immediate string.  */
      struct
      {
        unsigned immediate : 1;
        unsigned gcmarkbit : 1;
        unsigned size : BITS_PER_CHAR - 1;
        unsigned size_byte : BITS_PER_CHAR - 1;
        unsigned char data[STRING_IMMEDIATE_SIZE];
      } __attribute__ ((packed)) imm;
      /* Contains pointer to sdata.  */
      struct
      {
        unsigned immediate : 1;
        unsigned gcmarkbit : 1;
        unsigned size : BITS_PER_EMACS_INT - 1;
        unsigned size_byte : BITS_PER_EMACS_INT - 1;
        unsigned char *data;
      } __attribute__ ((packed)) dat;
    } u;
    INTERVAL intervals;         /* text properties in this string */
  };

This gives 9-byte "immediate" string on 32-bit and 21-byte on 64-bit (excluding
trailing '\0'). This is not suitable for long pure strings, btw.

Strictly speaking, this is not an optimization - it saves space at the (minimal 
?)
cost of speed since the most of string operations involves extra conditional
expression at least. For example,

#define STRING_BYTES(STR) ((STR)->size_byte < 0 ? (STR)->size : 
(STR)->size_byte)

becomes (over?)complicated

#define __IMM_P(STR) ((STR)->u.imm.immediate)
#define __IMMSIZE(STR) ((STR)->u.imm.size_byte < 0 ? (STR)->u.imm.size : 
(STR)->u.imm.size_byte)
#define __DATSIZE(STR) ((STR)->u.dat.size_byte < 0 ? (STR)->u.dat.size : 
(STR)->u.dat.size_byte)

#define STRING_BYTES(STR) (__IMM_P (STR) ? __IMMSIZE (str) : __DATSIZE (STR))

Dmitry





reply via email to

[Prev in Thread] Current Thread [Next in Thread]