guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] Improved `scm_from_locale_symbol ()' + `scm_take_locale_symbol (


From: Ludovic Courtès
Subject: [PATCH] Improved `scm_from_locale_symbol ()' + `scm_take_locale_symbol ()'
Date: Mon, 19 Dec 2005 18:23:40 +0100
User-agent: Gnus/5.110004 (No Gnus v0.4) Emacs/21.4 (gnu/linux)

Hi,

The patch below does two things:

1.  It introduces `scm_take_locale_symbol ()'.

2.  It modifies `scm_from_locale_symbol ()' so that it doesn't create a
    Scheme string to do the job.

This second modification has a nice effect: it can significantly reduce
the number of objects created at load-time.  Unfortunately, Guile's
built-in reader always produces Scheme strings (in `scm_read_token ()')
so it cannot benefit from this optimization.

Using a slightly modified version of `guile-reader' which does not
create Scheme strings when reading a symbol, I tried to measure the
improvement compared to Guile's built-in reader.  Basically, I had Guile
load a program that defines 20000 variables[*], first with Guile's
built-in reader, then with `guile-reader'.

With Guile's built-in reader:

   %   cumulative   self              self     total           
  time   seconds   seconds    calls   s/call   s/call  name    
  20.85     17.86    17.86    41385     0.00     0.00  ceval
  14.05     29.89    12.03    49963     0.00     0.00  scm_i_sweep_card
  12.50     40.60    10.71  2191128     0.00     0.00  scm_gc_mark_dependencies
   8.49     47.87     7.27     3149     0.00     0.01  
scm_i_mark_weak_vector_non_weaks
   5.50     52.58     4.72  4729157     0.00     0.00  scm_cell
   5.43     57.23     4.65  5710240     0.00     0.00  scm_gc_mark
  ...
   0.08     83.46     0.07    20687     0.00     0.00  scm_i_make_string

With `guile-reader':

   %   cumulative   self              self     total           
  time   seconds   seconds    calls   s/call   s/call  name    
  23.59     17.66    17.66    41385     0.00     0.00  ceval
  14.34     28.39    10.73    46707     0.00     0.00  scm_i_sweep_card
  11.24     36.80     8.41  1810700     0.00     0.00  scm_gc_mark_dependencies
   7.99     42.78     5.98     2712     0.00     0.01  
scm_i_mark_weak_vector_non_weaks
   6.32     47.51     4.73  4729153     0.00     0.00  scm_cell
   5.75     51.81     4.31  4767765     0.00     0.00  scm_gc_mark
  ...
   0.00     74.83     0.00      687     0.00     0.00  scm_i_make_string

The timings observed are around 15 s. (w/ Guile's built-in reader)
vs. 13 s. on my 500 MHz G4.

Clearly, the mark phase is much quicker as fewer strings were created in
the second case.  Of course, it would be nice if the built-in reader
could benefit from this as well, but this requires a fair amount of
(tedious) work.

Besides, `scm_take_locale_symbol ()' could be beneficial to application
writers as well.

Thanks,
Ludovic.

[*] Produced by:

    (with-output-to-file "t.scm"
      (lambda ()
        (for-each (lambda (x)
                    (format #t "(define sym~a ~a)~%" x x))
                  (iota 20000))))


libguile:

2005-12-19  Ludovic Courtès  <address@hidden>

        * strings.c (scm_i_take_stringbufn): New.
        (scm_i_c_take_symbol): New.
        (scm_take_locale_stringn): Use `scm_i_take_stringbufn ()'.

        * strings.h (scm_i_c_take_symbol): New.
        (scm_i_take_stringbufn): New.

        * symbols.c  (lookup_interned_symbol): New function.
        (scm_i_c_mem2symbol): New function.
        (scm_i_mem2symbol): Use `lookup_symbol ()'.
        (scm_from_locale_symbol): Use `scm_i_c_mem2symbol ()'.  This avoids
        creating a new Scheme string.
        (scm_from_locale_symboln): Likewise.
        (scm_take_locale_symbol): New.
        (scm_take_locale_symboln): New.

        * symbols.h (scm_take_locale_symbol): New.
        (scm_take_locale_symboln): New.


doc/ref:

2005-12-19  Ludovic Courtès  <address@hidden>

        * api-data.texi (Operations Related to Symbols):
        Documented `scm_take_locale_symbol ()'.


--- orig/doc/ref/api-data.texi
+++ mod/doc/ref/api-data.texi
@@ -4551,6 +4551,16 @@
 specified explicitly by @var{len}.
 @end deffn
 
address@hidden  {C Function} SCM scm_take_locale_symbol (char *str)
address@hidden {C Function} SCM scm_take_locale_symboln (char *str, size_t len)
+Like @code{scm_from_locale_symbol} and @code{scm_from_locale_symboln},
+respectively, but also frees @var{str} with @code{free} eventually.
+Thus, you can use this function when you would free @var{str} anyway
+immediately after creating the Scheme string.  In certain cases, Guile
+can then use @var{str} directly as its internal representation.
address@hidden deftypefn
+
+
 Finally, some applications, especially those that generate new Scheme
 code dynamically, need to generate symbols for use in the generated
 code.  The @code{gensym} primitive meets this need:


--- orig/libguile/strings.c
+++ mod/libguile/strings.c
@@ -122,6 +122,17 @@
     }
 }
 
+/* Return a new stringbuf whose underlying storage consists of the LEN octets
+   pointed to by STR.  */
+SCM_C_INLINE SCM
+scm_i_take_stringbufn (char *str, size_t len)
+{
+  scm_gc_register_collectable_memory (str, len, "stringbuf");
+
+  return scm_double_cell (STRINGBUF_TAG, (scm_t_bits) str,
+                         (scm_t_bits) len, (scm_t_bits) 0);
+}
+
 SCM
 scm_i_stringbuf_mark (SCM buf)
 {
@@ -412,6 +423,29 @@
                          (scm_t_bits) hash, SCM_UNPACK (props));
 }
 
+SCM
+scm_i_c_make_symbol (const char *name, size_t len,
+                    scm_t_bits flags, unsigned long hash, SCM props)
+{
+  SCM buf = make_stringbuf (len);
+  memcpy (STRINGBUF_CHARS (buf), name, len);
+
+  return scm_double_cell (scm_tc7_symbol | flags, SCM_UNPACK (buf),
+                         (scm_t_bits) hash, SCM_UNPACK (props));
+}
+
+/* Return a new symbol that uses the LEN bytes pointed to by NAME as its
+   underlying storage.  */
+SCM
+scm_i_c_take_symbol (char *name, size_t len,
+                    scm_t_bits flags, unsigned long hash, SCM props)
+{
+  SCM buf = scm_i_take_stringbufn (name, len);
+
+  return scm_double_cell (scm_tc7_symbol | flags, SCM_UNPACK (buf),
+                         (scm_t_bits) hash, SCM_UNPACK (props));
+}
+
 size_t
 scm_i_symbol_length (SCM sym)
 {
@@ -842,12 +876,10 @@
       str[len] = '\0';
     }
 
-  buf = scm_double_cell (STRINGBUF_TAG, (scm_t_bits) str,
-                         (scm_t_bits) len, (scm_t_bits) 0);
+  buf = scm_i_take_stringbufn (str, len);
   res = scm_double_cell (STRING_TAG,
                          SCM_UNPACK (buf),
                          (scm_t_bits) 0, (scm_t_bits) len);
-  scm_gc_register_collectable_memory (str, len+1, "string");
   return res;
 }
 


--- orig/libguile/strings.h
+++ mod/libguile/strings.h
@@ -124,6 +124,12 @@
 
 SCM_API SCM scm_i_make_symbol (SCM name, scm_t_bits flags, 
                               unsigned long hash, SCM props);
+SCM_API SCM
+scm_i_c_make_symbol (const char *name, size_t len,
+                    scm_t_bits flags, unsigned long hash, SCM props);
+SCM_API SCM
+scm_i_c_take_symbol (char *name, size_t len,
+                    scm_t_bits flags, unsigned long hash, SCM props);
 SCM_API const char *scm_i_symbol_chars (SCM sym);
 SCM_API size_t scm_i_symbol_length (SCM sym);
 SCM_API SCM scm_i_symbol_substring (SCM sym, size_t start, size_t end);
@@ -144,6 +150,7 @@
 SCM_API void scm_i_get_substring_spec (size_t len,
                                       SCM start, size_t *cstart,
                                       SCM end, size_t *cend);
+SCM_API SCM scm_i_take_stringbufn (char *str, size_t len);
 
 /* deprecated stuff */
 


--- orig/libguile/symbols.c
+++ mod/libguile/symbols.c
@@ -85,43 +85,79 @@
 }
 
 static SCM
-scm_i_mem2symbol (SCM str)
+lookup_interned_symbol (const char *name, size_t len,
+                       unsigned long raw_hash)
 {
-  const char *name = scm_i_string_chars (str);
-  size_t len = scm_i_string_length (str);
+  /* Try to find the symbol in the symbols table */
+  SCM l;
+  unsigned long hash = raw_hash % SCM_HASHTABLE_N_BUCKETS (symbols);
+
+  for (l = SCM_HASHTABLE_BUCKET (symbols, hash);
+       !scm_is_null (l);
+       l = SCM_CDR (l))
+    {
+      SCM sym = SCM_CAAR (l);
+      if (scm_i_symbol_hash (sym) == raw_hash
+         && scm_i_symbol_length (sym) == len)
+       {
+         const char *chrs = scm_i_symbol_chars (sym);
+         size_t i = len;
+
+         while (i != 0)
+           {
+             --i;
+             if (name[i] != chrs[i])
+               goto next_symbol;
+           }
+
+         return sym;
+       }
+    next_symbol:
+      ;
+    }
+
+  return SCM_BOOL_F;
+}
 
+static SCM
+scm_i_c_mem2symbol (const char *name, size_t len)
+{
+  SCM symbol;
   size_t raw_hash = scm_string_hash ((const unsigned char *) name, len);
   size_t hash = raw_hash % SCM_HASHTABLE_N_BUCKETS (symbols);
 
+  symbol = lookup_interned_symbol (name, len, raw_hash);
+  if (symbol != SCM_BOOL_F)
+    return symbol;
+
   {
-    /* Try to find the symbol in the symbols table */
+    /* The symbol was not found - create it. */
+    SCM symbol = scm_i_c_make_symbol (name, len, 0, raw_hash,
+                                     scm_cons (SCM_BOOL_F, SCM_EOL));
 
-    SCM l;
+    SCM slot = SCM_HASHTABLE_BUCKET (symbols, hash);
+    SCM cell = scm_cons (symbol, SCM_UNDEFINED);
+    SCM_SET_HASHTABLE_BUCKET (symbols, hash, scm_cons (cell, slot));
+    SCM_HASHTABLE_INCREMENT (symbols);
+    if (SCM_HASHTABLE_N_ITEMS (symbols) > SCM_HASHTABLE_UPPER (symbols))
+      scm_i_rehash (symbols, scm_i_hash_symbol, 0, "scm_mem2symbol");
 
-    for (l = SCM_HASHTABLE_BUCKET (symbols, hash);
-        !scm_is_null (l);
-        l = SCM_CDR (l))
-      {
-       SCM sym = SCM_CAAR (l);
-       if (scm_i_symbol_hash (sym) == raw_hash
-           && scm_i_symbol_length (sym) == len)
-         {
-           const char *chrs = scm_i_symbol_chars (sym);
-           size_t i = len;
-
-           while (i != 0)
-             {
-               --i;
-               if (name[i] != chrs[i])
-                 goto next_symbol;
-             }
-
-           return sym;
-         }
-      next_symbol:
-       ;
-      }
+    return symbol;
   }
+}
+
+static SCM
+scm_i_mem2symbol (SCM str)
+{
+  SCM symbol;
+  const char *name = scm_i_string_chars (str);
+  size_t len = scm_i_string_length (str);
+  size_t raw_hash = scm_string_hash ((const unsigned char *) name, len);
+  size_t hash = raw_hash % SCM_HASHTABLE_N_BUCKETS (symbols);
+
+  symbol = lookup_interned_symbol (name, len, raw_hash);
+  if (symbol != SCM_BOOL_F)
+    return symbol;
 
   {
     /* The symbol was not found - create it. */
@@ -139,6 +175,7 @@
   }
 }
 
+
 static SCM
 scm_i_mem2uninterned_symbol (SCM str)
 {
@@ -348,13 +385,50 @@
 SCM
 scm_from_locale_symbol (const char *sym)
 {
-  return scm_string_to_symbol (scm_from_locale_string (sym));
+  return scm_i_c_mem2symbol (sym, strlen (sym));
 }
 
 SCM
 scm_from_locale_symboln (const char *sym, size_t len)
 {
-  return scm_string_to_symbol (scm_from_locale_stringn (sym, len));
+  return scm_i_c_mem2symbol (sym, len);
+}
+
+SCM
+scm_take_locale_symboln (char *sym, size_t len)
+{
+  SCM res;
+  unsigned long raw_hash;
+
+  if (len == (size_t)-1)
+    len = strlen (sym);
+  else
+    {
+      /* Ensure STR is null terminated.  A realloc for 1 extra byte should
+         often be satisfied from the alignment padding after the block, with
+         no actual data movement.  */
+      sym = scm_realloc (sym, len+1);
+      sym[len] = '\0';
+    }
+
+  raw_hash = scm_string_hash ((unsigned char *)sym, len);
+  res = lookup_interned_symbol (sym, len, raw_hash);
+  if (res != SCM_BOOL_F)
+    {
+      free (sym);
+      return res;
+    }
+
+  res = scm_i_c_take_symbol (sym, len, 0, raw_hash,
+                            scm_cons (SCM_BOOL_F, SCM_EOL));
+
+  return res;
+}
+
+SCM
+scm_take_locale_symbol (char *sym)
+{
+  return scm_take_locale_symboln (sym, (size_t)-1);
 }
 
 void


--- orig/libguile/symbols.h
+++ mod/libguile/symbols.h
@@ -56,6 +56,8 @@
 
 SCM_API SCM scm_from_locale_symbol (const char *str);
 SCM_API SCM scm_from_locale_symboln (const char *str, size_t len);
+SCM_API SCM scm_take_locale_symbol (char *sym);
+SCM_API SCM scm_take_locale_symboln (char *sym, size_t len);
 
 /* internal functions. */
 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]