[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH] Improved `scm_from_locale_symbol ()' + `scm_take_locale_symbol (
From: |
Ludovic Courtès |
Subject: |
[PATCH] Improved `scm_from_locale_symbol ()' + `scm_take_locale_symbol ()' |
Date: |
Mon, 19 Dec 2005 18:23:40 +0100 |
User-agent: |
Gnus/5.110004 (No Gnus v0.4) Emacs/21.4 (gnu/linux) |
Hi,
The patch below does two things:
1. It introduces `scm_take_locale_symbol ()'.
2. It modifies `scm_from_locale_symbol ()' so that it doesn't create a
Scheme string to do the job.
This second modification has a nice effect: it can significantly reduce
the number of objects created at load-time. Unfortunately, Guile's
built-in reader always produces Scheme strings (in `scm_read_token ()')
so it cannot benefit from this optimization.
Using a slightly modified version of `guile-reader' which does not
create Scheme strings when reading a symbol, I tried to measure the
improvement compared to Guile's built-in reader. Basically, I had Guile
load a program that defines 20000 variables[*], first with Guile's
built-in reader, then with `guile-reader'.
With Guile's built-in reader:
% cumulative self self total
time seconds seconds calls s/call s/call name
20.85 17.86 17.86 41385 0.00 0.00 ceval
14.05 29.89 12.03 49963 0.00 0.00 scm_i_sweep_card
12.50 40.60 10.71 2191128 0.00 0.00 scm_gc_mark_dependencies
8.49 47.87 7.27 3149 0.00 0.01
scm_i_mark_weak_vector_non_weaks
5.50 52.58 4.72 4729157 0.00 0.00 scm_cell
5.43 57.23 4.65 5710240 0.00 0.00 scm_gc_mark
...
0.08 83.46 0.07 20687 0.00 0.00 scm_i_make_string
With `guile-reader':
% cumulative self self total
time seconds seconds calls s/call s/call name
23.59 17.66 17.66 41385 0.00 0.00 ceval
14.34 28.39 10.73 46707 0.00 0.00 scm_i_sweep_card
11.24 36.80 8.41 1810700 0.00 0.00 scm_gc_mark_dependencies
7.99 42.78 5.98 2712 0.00 0.01
scm_i_mark_weak_vector_non_weaks
6.32 47.51 4.73 4729153 0.00 0.00 scm_cell
5.75 51.81 4.31 4767765 0.00 0.00 scm_gc_mark
...
0.00 74.83 0.00 687 0.00 0.00 scm_i_make_string
The timings observed are around 15 s. (w/ Guile's built-in reader)
vs. 13 s. on my 500 MHz G4.
Clearly, the mark phase is much quicker as fewer strings were created in
the second case. Of course, it would be nice if the built-in reader
could benefit from this as well, but this requires a fair amount of
(tedious) work.
Besides, `scm_take_locale_symbol ()' could be beneficial to application
writers as well.
Thanks,
Ludovic.
[*] Produced by:
(with-output-to-file "t.scm"
(lambda ()
(for-each (lambda (x)
(format #t "(define sym~a ~a)~%" x x))
(iota 20000))))
libguile:
2005-12-19 Ludovic Courtès <address@hidden>
* strings.c (scm_i_take_stringbufn): New.
(scm_i_c_take_symbol): New.
(scm_take_locale_stringn): Use `scm_i_take_stringbufn ()'.
* strings.h (scm_i_c_take_symbol): New.
(scm_i_take_stringbufn): New.
* symbols.c (lookup_interned_symbol): New function.
(scm_i_c_mem2symbol): New function.
(scm_i_mem2symbol): Use `lookup_symbol ()'.
(scm_from_locale_symbol): Use `scm_i_c_mem2symbol ()'. This avoids
creating a new Scheme string.
(scm_from_locale_symboln): Likewise.
(scm_take_locale_symbol): New.
(scm_take_locale_symboln): New.
* symbols.h (scm_take_locale_symbol): New.
(scm_take_locale_symboln): New.
doc/ref:
2005-12-19 Ludovic Courtès <address@hidden>
* api-data.texi (Operations Related to Symbols):
Documented `scm_take_locale_symbol ()'.
--- orig/doc/ref/api-data.texi
+++ mod/doc/ref/api-data.texi
@@ -4551,6 +4551,16 @@
specified explicitly by @var{len}.
@end deffn
address@hidden {C Function} SCM scm_take_locale_symbol (char *str)
address@hidden {C Function} SCM scm_take_locale_symboln (char *str, size_t len)
+Like @code{scm_from_locale_symbol} and @code{scm_from_locale_symboln},
+respectively, but also frees @var{str} with @code{free} eventually.
+Thus, you can use this function when you would free @var{str} anyway
+immediately after creating the Scheme string. In certain cases, Guile
+can then use @var{str} directly as its internal representation.
address@hidden deftypefn
+
+
Finally, some applications, especially those that generate new Scheme
code dynamically, need to generate symbols for use in the generated
code. The @code{gensym} primitive meets this need:
--- orig/libguile/strings.c
+++ mod/libguile/strings.c
@@ -122,6 +122,17 @@
}
}
+/* Return a new stringbuf whose underlying storage consists of the LEN octets
+ pointed to by STR. */
+SCM_C_INLINE SCM
+scm_i_take_stringbufn (char *str, size_t len)
+{
+ scm_gc_register_collectable_memory (str, len, "stringbuf");
+
+ return scm_double_cell (STRINGBUF_TAG, (scm_t_bits) str,
+ (scm_t_bits) len, (scm_t_bits) 0);
+}
+
SCM
scm_i_stringbuf_mark (SCM buf)
{
@@ -412,6 +423,29 @@
(scm_t_bits) hash, SCM_UNPACK (props));
}
+SCM
+scm_i_c_make_symbol (const char *name, size_t len,
+ scm_t_bits flags, unsigned long hash, SCM props)
+{
+ SCM buf = make_stringbuf (len);
+ memcpy (STRINGBUF_CHARS (buf), name, len);
+
+ return scm_double_cell (scm_tc7_symbol | flags, SCM_UNPACK (buf),
+ (scm_t_bits) hash, SCM_UNPACK (props));
+}
+
+/* Return a new symbol that uses the LEN bytes pointed to by NAME as its
+ underlying storage. */
+SCM
+scm_i_c_take_symbol (char *name, size_t len,
+ scm_t_bits flags, unsigned long hash, SCM props)
+{
+ SCM buf = scm_i_take_stringbufn (name, len);
+
+ return scm_double_cell (scm_tc7_symbol | flags, SCM_UNPACK (buf),
+ (scm_t_bits) hash, SCM_UNPACK (props));
+}
+
size_t
scm_i_symbol_length (SCM sym)
{
@@ -842,12 +876,10 @@
str[len] = '\0';
}
- buf = scm_double_cell (STRINGBUF_TAG, (scm_t_bits) str,
- (scm_t_bits) len, (scm_t_bits) 0);
+ buf = scm_i_take_stringbufn (str, len);
res = scm_double_cell (STRING_TAG,
SCM_UNPACK (buf),
(scm_t_bits) 0, (scm_t_bits) len);
- scm_gc_register_collectable_memory (str, len+1, "string");
return res;
}
--- orig/libguile/strings.h
+++ mod/libguile/strings.h
@@ -124,6 +124,12 @@
SCM_API SCM scm_i_make_symbol (SCM name, scm_t_bits flags,
unsigned long hash, SCM props);
+SCM_API SCM
+scm_i_c_make_symbol (const char *name, size_t len,
+ scm_t_bits flags, unsigned long hash, SCM props);
+SCM_API SCM
+scm_i_c_take_symbol (char *name, size_t len,
+ scm_t_bits flags, unsigned long hash, SCM props);
SCM_API const char *scm_i_symbol_chars (SCM sym);
SCM_API size_t scm_i_symbol_length (SCM sym);
SCM_API SCM scm_i_symbol_substring (SCM sym, size_t start, size_t end);
@@ -144,6 +150,7 @@
SCM_API void scm_i_get_substring_spec (size_t len,
SCM start, size_t *cstart,
SCM end, size_t *cend);
+SCM_API SCM scm_i_take_stringbufn (char *str, size_t len);
/* deprecated stuff */
--- orig/libguile/symbols.c
+++ mod/libguile/symbols.c
@@ -85,43 +85,79 @@
}
static SCM
-scm_i_mem2symbol (SCM str)
+lookup_interned_symbol (const char *name, size_t len,
+ unsigned long raw_hash)
{
- const char *name = scm_i_string_chars (str);
- size_t len = scm_i_string_length (str);
+ /* Try to find the symbol in the symbols table */
+ SCM l;
+ unsigned long hash = raw_hash % SCM_HASHTABLE_N_BUCKETS (symbols);
+
+ for (l = SCM_HASHTABLE_BUCKET (symbols, hash);
+ !scm_is_null (l);
+ l = SCM_CDR (l))
+ {
+ SCM sym = SCM_CAAR (l);
+ if (scm_i_symbol_hash (sym) == raw_hash
+ && scm_i_symbol_length (sym) == len)
+ {
+ const char *chrs = scm_i_symbol_chars (sym);
+ size_t i = len;
+
+ while (i != 0)
+ {
+ --i;
+ if (name[i] != chrs[i])
+ goto next_symbol;
+ }
+
+ return sym;
+ }
+ next_symbol:
+ ;
+ }
+
+ return SCM_BOOL_F;
+}
+static SCM
+scm_i_c_mem2symbol (const char *name, size_t len)
+{
+ SCM symbol;
size_t raw_hash = scm_string_hash ((const unsigned char *) name, len);
size_t hash = raw_hash % SCM_HASHTABLE_N_BUCKETS (symbols);
+ symbol = lookup_interned_symbol (name, len, raw_hash);
+ if (symbol != SCM_BOOL_F)
+ return symbol;
+
{
- /* Try to find the symbol in the symbols table */
+ /* The symbol was not found - create it. */
+ SCM symbol = scm_i_c_make_symbol (name, len, 0, raw_hash,
+ scm_cons (SCM_BOOL_F, SCM_EOL));
- SCM l;
+ SCM slot = SCM_HASHTABLE_BUCKET (symbols, hash);
+ SCM cell = scm_cons (symbol, SCM_UNDEFINED);
+ SCM_SET_HASHTABLE_BUCKET (symbols, hash, scm_cons (cell, slot));
+ SCM_HASHTABLE_INCREMENT (symbols);
+ if (SCM_HASHTABLE_N_ITEMS (symbols) > SCM_HASHTABLE_UPPER (symbols))
+ scm_i_rehash (symbols, scm_i_hash_symbol, 0, "scm_mem2symbol");
- for (l = SCM_HASHTABLE_BUCKET (symbols, hash);
- !scm_is_null (l);
- l = SCM_CDR (l))
- {
- SCM sym = SCM_CAAR (l);
- if (scm_i_symbol_hash (sym) == raw_hash
- && scm_i_symbol_length (sym) == len)
- {
- const char *chrs = scm_i_symbol_chars (sym);
- size_t i = len;
-
- while (i != 0)
- {
- --i;
- if (name[i] != chrs[i])
- goto next_symbol;
- }
-
- return sym;
- }
- next_symbol:
- ;
- }
+ return symbol;
}
+}
+
+static SCM
+scm_i_mem2symbol (SCM str)
+{
+ SCM symbol;
+ const char *name = scm_i_string_chars (str);
+ size_t len = scm_i_string_length (str);
+ size_t raw_hash = scm_string_hash ((const unsigned char *) name, len);
+ size_t hash = raw_hash % SCM_HASHTABLE_N_BUCKETS (symbols);
+
+ symbol = lookup_interned_symbol (name, len, raw_hash);
+ if (symbol != SCM_BOOL_F)
+ return symbol;
{
/* The symbol was not found - create it. */
@@ -139,6 +175,7 @@
}
}
+
static SCM
scm_i_mem2uninterned_symbol (SCM str)
{
@@ -348,13 +385,50 @@
SCM
scm_from_locale_symbol (const char *sym)
{
- return scm_string_to_symbol (scm_from_locale_string (sym));
+ return scm_i_c_mem2symbol (sym, strlen (sym));
}
SCM
scm_from_locale_symboln (const char *sym, size_t len)
{
- return scm_string_to_symbol (scm_from_locale_stringn (sym, len));
+ return scm_i_c_mem2symbol (sym, len);
+}
+
+SCM
+scm_take_locale_symboln (char *sym, size_t len)
+{
+ SCM res;
+ unsigned long raw_hash;
+
+ if (len == (size_t)-1)
+ len = strlen (sym);
+ else
+ {
+ /* Ensure STR is null terminated. A realloc for 1 extra byte should
+ often be satisfied from the alignment padding after the block, with
+ no actual data movement. */
+ sym = scm_realloc (sym, len+1);
+ sym[len] = '\0';
+ }
+
+ raw_hash = scm_string_hash ((unsigned char *)sym, len);
+ res = lookup_interned_symbol (sym, len, raw_hash);
+ if (res != SCM_BOOL_F)
+ {
+ free (sym);
+ return res;
+ }
+
+ res = scm_i_c_take_symbol (sym, len, 0, raw_hash,
+ scm_cons (SCM_BOOL_F, SCM_EOL));
+
+ return res;
+}
+
+SCM
+scm_take_locale_symbol (char *sym)
+{
+ return scm_take_locale_symboln (sym, (size_t)-1);
}
void
--- orig/libguile/symbols.h
+++ mod/libguile/symbols.h
@@ -56,6 +56,8 @@
SCM_API SCM scm_from_locale_symbol (const char *str);
SCM_API SCM scm_from_locale_symboln (const char *str, size_t len);
+SCM_API SCM scm_take_locale_symbol (char *sym);
+SCM_API SCM scm_take_locale_symboln (char *sym, size_t len);
/* internal functions. */
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [PATCH] Improved `scm_from_locale_symbol ()' + `scm_take_locale_symbol ()',
Ludovic Courtès <=