guile-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Guile-commits] GNU Guile branch, master, updated. v2.1.0-45-g7914b2b


From: Andy Wingo
Subject: [Guile-commits] GNU Guile branch, master, updated. v2.1.0-45-g7914b2b
Date: Tue, 25 Oct 2011 22:09:45 +0000

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU Guile".

http://git.savannah.gnu.org/cgit/guile.git/commit/?id=7914b2b0690ecd65c89a80e2451f44f0e0d64940

The branch, master has been updated
       via  7914b2b0690ecd65c89a80e2451f44f0e0d64940 (commit)
       via  1a04d29db7957ef295ffc1b0be8323ba2889f1a0 (commit)
       via  cd644b54246a889f6f8860feb7d918fc8953178d (commit)
       via  41d1d984aee43bc4964c87229004e1c02a6ddba7 (commit)
       via  25d50a051d8de9c438d6ed910bec13be682b3b12 (commit)
       via  f80d15c59e962d197c0cb8e98fd84bdd27bc449e (commit)
      from  8b33752be7950b66bf0007e282eae3d13502f445 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 7914b2b0690ecd65c89a80e2451f44f0e0d64940
Author: Andy Wingo <address@hidden>
Date:   Wed Oct 26 00:06:53 2011 +0200

    symbols.h reindent
    
    * libguile/symbols.h: Reindent.

commit 1a04d29db7957ef295ffc1b0be8323ba2889f1a0
Author: Andy Wingo <address@hidden>
Date:   Tue Oct 25 23:46:17 2011 +0200

    use bob jenkins' hashword2 hash from lookup3.c for our string hash
    
    * libguile/hash.c (JENKINS_LOOKUP3_HASHWORD2, narrow_string_hash)
      (wide_string_hash, scm_string_hash, scm_i_string_hash)
      (scm_i_latin1_string_hash): Replace our lame string hash with Bob
      Jenkins' hash, treating each codepoint as a word, for the purposes of
      the algorithm.  There are probably more optimal hashes for our use
      cases.
      (scm_i_locale_string_hash): Remove optimization, as it wasn't used.
      (scm_i_utf8_string_hash): Add a specialized implementation for utf8.
      It's tricky but mostly just cut-and-paste.

commit cd644b54246a889f6f8860feb7d918fc8953178d
Author: Andy Wingo <address@hidden>
Date:   Wed Oct 26 00:06:33 2011 +0200

    fix a vhash test
    
    * test-suite/tests/vlist.test ("vhash"): As far as I can tell this test
      was not testing the right thing.

commit 41d1d984aee43bc4964c87229004e1c02a6ddba7
Author: Andy Wingo <address@hidden>
Date:   Tue Oct 25 18:18:39 2011 +0200

    optimize scm_from_utf8_stringn
    
    * libguile/strings.c (decoding_error): Factor out of scm_from_stringn,
      properly handling errno.
      (scm_from_stringn): Adapt.
      (scm_from_utf8_stringn): Inline the conversion here, to avoid going
      through iconv.

commit 25d50a051d8de9c438d6ed910bec13be682b3b12
Author: Andy Wingo <address@hidden>
Date:   Tue Oct 25 17:45:29 2011 +0200

    most uses of scm_from_locale_symbol become scm_from_utf8_symbol
    
    * libguile/array-handle.c:
    * libguile/chars.c:
    * libguile/expand.c:
    * libguile/feature.c:
    * libguile/goops.c:
    * libguile/gsubr.c:
    * libguile/instructions.c:
    * libguile/load.c:
    * libguile/macros.c:
    * libguile/memoize.c:
    * libguile/modules.c:
    * libguile/options.c:
    * libguile/print.c:
    * libguile/smob.c:
    * libguile/snarf.h: Change most uses of scm_from_locale_symbol to
      scm_from_utf8_symbol, as the symbols really are not locale-dependent.

commit f80d15c59e962d197c0cb8e98fd84bdd27bc449e
Author: Andy Wingo <address@hidden>
Date:   Tue Oct 25 17:32:50 2011 +0200

    optimize utf8 symbol lookup
    
    * libguile/symbols.c (utf8_string_equals_wide_string)
      (utf8_lookup_predicate_fn, lookup_interned_utf8_symbol): Optimize
      utf8 symbol lookup.

-----------------------------------------------------------------------

Summary of changes:
 libguile/array-handle.c     |    4 +-
 libguile/chars.c            |    4 +-
 libguile/expand.c           |    4 +-
 libguile/feature.c          |    2 +-
 libguile/goops.c            |    4 +-
 libguile/gsubr.c            |    2 +-
 libguile/hash.c             |  203 ++++++++++++++++++++++++++++++-------------
 libguile/instructions.c     |    2 +-
 libguile/load.c             |    2 +-
 libguile/macros.c           |    2 +-
 libguile/memoize.c          |    2 +-
 libguile/modules.c          |   26 +++---
 libguile/options.c          |    4 +-
 libguile/print.c            |    2 +-
 libguile/smob.c             |    2 +-
 libguile/snarf.h            |    4 +-
 libguile/strings.c          |  107 ++++++++++++++++++++---
 libguile/symbols.c          |   86 ++++++++++++++++++-
 libguile/symbols.h          |    2 +-
 test-suite/tests/vlist.test |    4 +-
 20 files changed, 354 insertions(+), 114 deletions(-)

diff --git a/libguile/array-handle.c b/libguile/array-handle.c
index ec3127a..7114f78 100644
--- a/libguile/array-handle.c
+++ b/libguile/array-handle.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 1995,1996,1997,1998,2000,2001,2002,2003,2004, 2005, 2006, 
2009 Free Software Foundation, Inc.
+/* Copyright (C) 1995,1996,1997,1998,2000,2001,2002,2003,2004, 2005, 2006, 
2009, 2011 Free Software Foundation, Inc.
  * 
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -132,7 +132,7 @@ void
 scm_init_array_handle (void)
 {
 #define DEFINE_ARRAY_TYPE(tag, TAG)                             \
-  scm_i_array_element_types[SCM_ARRAY_ELEMENT_TYPE_##TAG] = 
scm_from_locale_symbol (#tag)
+  scm_i_array_element_types[SCM_ARRAY_ELEMENT_TYPE_##TAG] = 
scm_from_utf8_symbol (#tag)
   
   scm_i_array_element_types[SCM_ARRAY_ELEMENT_TYPE_SCM] = SCM_BOOL_T;
   DEFINE_ARRAY_TYPE (a, CHAR);
diff --git a/libguile/chars.c b/libguile/chars.c
index 2e16105..fbedb0f 100644
--- a/libguile/chars.c
+++ b/libguile/chars.c
@@ -1,4 +1,4 @@
-/*     Copyright (C) 1995,1996,1998, 2000, 2001, 2004, 2006, 2008, 2009, 2010 
Free Software Foundation, Inc.
+/*     Copyright (C) 1995,1996,1998, 2000, 2001, 2004, 2006, 2008, 2009, 2010, 
2011 Free Software Foundation, Inc.
  * 
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -492,7 +492,7 @@ SCM_DEFINE (scm_char_general_category, 
"char-general-category", 1, 0, 0,
   sym = uc_general_category_name (cat);
 
   if (sym != NULL)
-    return scm_from_locale_symbol (sym);
+    return scm_from_utf8_symbol (sym);
   return SCM_BOOL_F;
 }
 #undef FUNC_NAME
diff --git a/libguile/expand.c b/libguile/expand.c
index 78dd4ca..3f23d4f 100644
--- a/libguile/expand.c
+++ b/libguile/expand.c
@@ -1214,13 +1214,13 @@ make_exp_vtable (size_t n)
     (scm_string_append (scm_make_list (scm_from_size_t (exp_nfields[n]),
                                        scm_from_locale_string ("pw"))));
   printer = SCM_BOOL_F;
-  name = scm_from_locale_symbol (exp_names[n]);
+  name = scm_from_utf8_symbol (exp_names[n]);
   code = scm_from_size_t (n);
   fields = SCM_EOL;
   {
     size_t m = exp_nfields[n];
     while (m--)
-      fields = scm_cons (scm_from_locale_symbol (exp_field_names[n][m]), 
fields);
+      fields = scm_cons (scm_from_utf8_symbol (exp_field_names[n][m]), fields);
   }
 
   return scm_c_make_struct (scm_exp_vtable_vtable, 0, 5,
diff --git a/libguile/feature.c b/libguile/feature.c
index ebb77cb..ca83421 100644
--- a/libguile/feature.c
+++ b/libguile/feature.c
@@ -43,7 +43,7 @@ void
 scm_add_feature (const char *str)
 {
   SCM old = SCM_VARIABLE_REF (features_var);
-  SCM new = scm_cons (scm_from_locale_symbol (str), old);
+  SCM new = scm_cons (scm_from_utf8_symbol (str), old);
   SCM_VARIABLE_SET (features_var, new);
 }
 
diff --git a/libguile/goops.c b/libguile/goops.c
index 4b09f33..cefc03b 100644
--- a/libguile/goops.c
+++ b/libguile/goops.c
@@ -2308,7 +2308,7 @@ fix_cpl (SCM c, SCM before, SCM after)
 static void
 make_stdcls (SCM *var, char *name, SCM meta, SCM super, SCM slots)
 {
-   SCM tmp = scm_from_locale_symbol (name);
+   SCM tmp = scm_from_utf8_symbol (name);
 
    *var = scm_basic_make_class (meta, tmp,
                                 scm_is_pair (super) ? super : scm_list_1 
(super),
@@ -2514,7 +2514,7 @@ make_class_from_template (char const *template, char 
const *type_name, SCM super
     {
       char buffer[100];
       sprintf (buffer, template, type_name);
-      name = scm_from_locale_symbol (buffer);
+      name = scm_from_utf8_symbol (buffer);
     }
   else
     name = SCM_GOOPS_UNBOUND;
diff --git a/libguile/gsubr.c b/libguile/gsubr.c
index b6f261f..ca92cc5 100644
--- a/libguile/gsubr.c
+++ b/libguile/gsubr.c
@@ -791,7 +791,7 @@ create_gsubr (int define, const char *name,
   scm_t_bits flags;
 
   /* make objtable */
-  sname = scm_from_locale_symbol (name);
+  sname = scm_from_utf8_symbol (name);
   table = scm_c_make_vector (generic_loc ? 3 : 2, SCM_UNDEFINED);
   SCM_SIMPLE_VECTOR_SET (table, 0, scm_from_pointer (fcn, NULL));
   SCM_SIMPLE_VECTOR_SET (table, 1, sname);
diff --git a/libguile/hash.c b/libguile/hash.c
index a79f03d..b620b16 100644
--- a/libguile/hash.c
+++ b/libguile/hash.c
@@ -45,103 +45,182 @@ extern double floor();
 #endif
 
 
+/* This hash function is originally from
+   http://burtleburtle.net/bob/c/lookup3.c by Bob Jenkins, May 2006,
+   Public Domain.  No warranty.  */
+
+#define rot(x,k) (((x)<<(k)) | ((x)>>(32-(k))))
+#define mix(a,b,c) \
+{ \
+  a -= c;  a ^= rot(c, 4);  c += b; \
+  b -= a;  b ^= rot(a, 6);  a += c; \
+  c -= b;  c ^= rot(b, 8);  b += a; \
+  a -= c;  a ^= rot(c,16);  c += b; \
+  b -= a;  b ^= rot(a,19);  a += c; \
+  c -= b;  c ^= rot(b, 4);  b += a; \
+}
+
+#define final(a,b,c) \
+{ \
+  c ^= b; c -= rot(b,14); \
+  a ^= c; a -= rot(c,11); \
+  b ^= a; b -= rot(a,25); \
+  c ^= b; c -= rot(b,16); \
+  a ^= c; a -= rot(c,4);  \
+  b ^= a; b -= rot(a,14); \
+  c ^= b; c -= rot(b,24); \
+}
+
+#define JENKINS_LOOKUP3_HASHWORD2(k, length, ret)                       \
+  do {                                                                  \
+    scm_t_uint32 a, b, c;                                               \
+                                                                        \
+    /* Set up the internal state.  */                                   \
+    a = b = c = 0xdeadbeef + ((scm_t_uint32)(length<<2)) + 47;          \
+                                                                        \
+    /* Handle most of the key.  */                                      \
+    while (length > 3)                                                  \
+      {                                                                 \
+        a += k[0];                                                      \
+        b += k[1];                                                      \
+        c += k[2];                                                      \
+        mix (a, b, c);                                                  \
+        length -= 3;                                                    \
+        k += 3;                                                         \
+      }                                                                 \
+                                                                        \
+    /* Handle the last 3 elements.  */                                  \
+    switch(length) /* All the case statements fall through.  */         \
+      {                                                                 \
+      case 3 : c += k[2];                                               \
+      case 2 : b += k[1];                                               \
+      case 1 : a += k[0];                                               \
+        final (a, b, c);                                                \
+      case 0:     /* case 0: nothing left to add */                     \
+        break;                                                          \
+      }                                                                 \
+                                                                        \
+    if (sizeof (ret) == 8)                                              \
+      ret = (((unsigned long) c) << 32) | b;                            \
+    else                                                                \
+      ret = c;                                                          \
+  } while (0)
+
+
+static unsigned long
+narrow_string_hash (const scm_t_uint8 *str, size_t len)
+{
+  unsigned long ret;
+  JENKINS_LOOKUP3_HASHWORD2 (str, len, ret);
+  ret >>= 2; /* Ensure that it fits in a fixnum.  */
+  return ret;
+}
+
+static unsigned long
+wide_string_hash (const scm_t_wchar *str, size_t len)
+{
+  unsigned long ret;
+  JENKINS_LOOKUP3_HASHWORD2 (str, len, ret);
+  ret >>= 2; /* Ensure that it fits in a fixnum.  */
+  return ret;
+}
+
 unsigned long 
 scm_string_hash (const unsigned char *str, size_t len)
 {
-  /* from suggestion at: */
-  /* http://srfi.schemers.org/srfi-13/mail-archive/msg00112.html */
-
-  unsigned long h = 0;
-  while (len-- > 0)
-    h = *str++ + h*37;
-  return h;
+  return narrow_string_hash (str, len);
 }
 
 unsigned long 
 scm_i_string_hash (SCM str)
 {
   size_t len = scm_i_string_length (str);
-  size_t i = 0;
-
-  unsigned long h = 0;
-  while (len-- > 0)
-    h = (unsigned long) scm_i_string_ref (str, i++) + h * 37;
 
-  scm_remember_upto_here_1 (str);
-  return h;
+  if (scm_i_is_narrow_string (str))
+    return narrow_string_hash ((const scm_t_uint8 *) scm_i_string_chars (str),
+                               len);
+  else
+    return wide_string_hash (scm_i_string_wide_chars (str), len);
 }
 
 unsigned long 
 scm_i_locale_string_hash (const char *str, size_t len)
 {
-#ifdef HAVE_WCHAR_H
-  mbstate_t state;
-  wchar_t c;
-  size_t byte_idx = 0, nbytes;
-  unsigned long h = 0;
-
-  if (len == (size_t) -1)
-    len = strlen (str);
-
-  while ((nbytes = mbrtowc (&c, str + byte_idx, len - byte_idx, &state)) > 0)
-    {
-      if (nbytes >= (size_t) -2)
-        /* Invalid input string; punt.  */
-        return scm_i_string_hash (scm_from_locale_stringn (str, len));
-
-      h = (unsigned long) c + h * 37;
-      byte_idx += nbytes;
-    }
-
-  return h;
-#else
   return scm_i_string_hash (scm_from_locale_stringn (str, len));
-#endif
 }
 
 unsigned long 
 scm_i_latin1_string_hash (const char *str, size_t len)
 {
-  const scm_t_uint8 *ustr = (const scm_t_uint8 *) str;
-  size_t i = 0;
-  unsigned long h = 0;
-  
   if (len == (size_t) -1)
     len = strlen (str);
 
-  for (; i < len; i++)
-    h = (unsigned long) ustr[i] + h * 37;
-
-  return h;
+  return narrow_string_hash ((const scm_t_uint8 *) str, len);
 }
 
+/* A tricky optimization, but probably worth it.  */
 unsigned long 
 scm_i_utf8_string_hash (const char *str, size_t len)
 {
-  const scm_t_uint8 *ustr = (const scm_t_uint8 *) str;
-  size_t byte_idx = 0;
-  unsigned long h = 0;
-  
+  const scm_t_uint8 *end, *ustr = (const scm_t_uint8 *) str;
+  unsigned long ret;
+
+  /* The length of the string in characters.  This name corresponds to
+     Jenkins' original name.  */
+  size_t length;
+
+  scm_t_uint32 a, b, c, u32;
+
   if (len == (size_t) -1)
     len = strlen (str);
 
-  while (byte_idx < len)
+  end = ustr + len;
+
+  if (u8_check (ustr, len) != NULL)
+    /* Invalid UTF-8; punt.  */
+    return scm_i_string_hash (scm_from_utf8_stringn (str, len));
+
+  length = u8_strnlen (ustr, len);
+
+  /* Set up the internal state.  */
+  a = b = c = 0xdeadbeef + ((scm_t_uint32)(length<<2)) + 47;
+
+  /* Handle most of the key.  */
+  while (length > 3)
     {
-      ucs4_t c;
-      int nbytes;
-
-      nbytes = u8_mbtouc (&c, ustr + byte_idx, len - byte_idx);
-      if (nbytes == 0)
-        break;
-      else if (nbytes < 0)
-        /* Bad UTF-8; punt.  */
-        return scm_i_string_hash (scm_from_utf8_stringn (str, len));
-
-      h = (unsigned long) c + h * 37;
-      byte_idx += nbytes;
+      ustr += u8_mbtouc_unsafe (&u32, ustr, end - ustr);
+      a += u32;
+      ustr += u8_mbtouc_unsafe (&u32, ustr, end - ustr);
+      b += u32;
+      ustr += u8_mbtouc_unsafe (&u32, ustr, end - ustr);
+      c += u32;
+      mix (a, b, c);
+      length -= 3;
     }
 
-  return h;
+  /* Handle the last 3 elements's.  */
+  ustr += u8_mbtouc_unsafe (&u32, ustr, end - ustr);
+  a += u32;
+  if (--length)
+    {
+      ustr += u8_mbtouc_unsafe (&u32, ustr, end - ustr);
+      b += u32;
+      if (--length)
+        {
+          ustr += u8_mbtouc_unsafe (&u32, ustr, end - ustr);
+          c += u32;
+        }
+    }
+
+  final (a, b, c);
+
+  if (sizeof (unsigned long) == 8)
+    ret = (((unsigned long) c) << 32) | b;
+  else
+    ret = c;
+
+  ret >>= 2; /* Ensure that it fits in a fixnum.  */
+  return ret;
 }
 
 
diff --git a/libguile/instructions.c b/libguile/instructions.c
index ef4a9ce..f3b8963 100644
--- a/libguile/instructions.c
+++ b/libguile/instructions.c
@@ -72,7 +72,7 @@ fetch_instruction_table ()
         {
           table[i].opcode = i;
           if (table[i].name)
-            table[i].symname = scm_from_locale_symbol (table[i].name);
+            table[i].symname = scm_from_utf8_symbol (table[i].name);
           else
             table[i].symname = SCM_BOOL_F;
         }
diff --git a/libguile/load.c b/libguile/load.c
index c6e6887..e5126ed 100644
--- a/libguile/load.c
+++ b/libguile/load.c
@@ -998,7 +998,7 @@ init_build_info ()
 
   for (i = 0; i < (sizeof (info) / sizeof (info[0])); i++)
     {
-      SCM key = scm_from_locale_symbol (info[i].name);
+      SCM key = scm_from_utf8_symbol (info[i].name);
       SCM val = scm_from_locale_string (info[i].value);
       *loc = scm_acons (key, val, *loc);
     }
diff --git a/libguile/macros.c b/libguile/macros.c
index bf351e4..5a4b9d6 100644
--- a/libguile/macros.c
+++ b/libguile/macros.c
@@ -64,7 +64,7 @@ scm_i_make_primitive_macro (const char *name, 
scm_t_macro_primitive fn)
 {
   SCM z = scm_words (scm_tc16_macro, 5);
   SCM_SET_SMOB_DATA_N (z, 1, (scm_t_bits)fn);
-  SCM_SET_SMOB_OBJECT_N (z, 2, scm_from_locale_symbol (name));
+  SCM_SET_SMOB_OBJECT_N (z, 2, scm_from_utf8_symbol (name));
   SCM_SET_SMOB_OBJECT_N (z, 3, SCM_BOOL_F);
   SCM_SET_SMOB_OBJECT_N (z, 4, SCM_BOOL_F);
   return z;
diff --git a/libguile/memoize.c b/libguile/memoize.c
index 801088c..e5ed629 100644
--- a/libguile/memoize.c
+++ b/libguile/memoize.c
@@ -618,7 +618,7 @@ unmemoize_lexical (SCM n)
   char buf[16];
   buf[15] = 0;
   snprintf (buf, 15, "<%u>", scm_to_uint32 (n));
-  return scm_from_locale_symbol (buf);
+  return scm_from_utf8_symbol (buf);
 }
 
 static SCM
diff --git a/libguile/modules.c b/libguile/modules.c
index 63268fb..7498549 100644
--- a/libguile/modules.c
+++ b/libguile/modules.c
@@ -153,7 +153,7 @@ convert_module_name (const char *name)
        ptr++;
       if (ptr > name)
        {
-         SCM sym = scm_from_locale_symboln (name, ptr-name);
+         SCM sym = scm_from_utf8_symboln (name, ptr-name);
          *tail = scm_cons (sym, SCM_EOL);
          tail = SCM_CDRLOC (*tail);
        }
@@ -218,7 +218,7 @@ scm_c_export (const char *name, ...)
   if (name)
     {
       va_list ap;
-      SCM names = scm_cons (scm_from_locale_symbol (name), SCM_EOL);
+      SCM names = scm_cons (scm_from_utf8_symbol (name), SCM_EOL);
       SCM *tail = SCM_CDRLOC (names);
       va_start (ap, name);
       while (1)
@@ -226,7 +226,7 @@ scm_c_export (const char *name, ...)
          const char *n = va_arg (ap, const char *);
          if (n == NULL)
            break;
-         *tail = scm_cons (scm_from_locale_symbol (n), SCM_EOL);
+         *tail = scm_cons (scm_from_utf8_symbol (n), SCM_EOL);
          tail = SCM_CDRLOC (*tail);
        }
       va_end (ap);
@@ -734,7 +734,7 @@ scm_sym2var (SCM sym, SCM proc, SCM definep)
 SCM
 scm_c_module_lookup (SCM module, const char *name)
 {
-  return scm_module_lookup (module, scm_from_locale_symbol (name));
+  return scm_module_lookup (module, scm_from_utf8_symbol (name));
 }
 
 SCM
@@ -754,7 +754,7 @@ scm_module_lookup (SCM module, SCM sym)
 SCM
 scm_c_lookup (const char *name)
 {
-  return scm_lookup (scm_from_locale_symbol (name));
+  return scm_lookup (scm_from_utf8_symbol (name));
 }
 
 SCM
@@ -807,14 +807,14 @@ SCM
 scm_c_public_variable (const char *module_name, const char *name)
 {
   return scm_public_variable (convert_module_name (module_name),
-                              scm_from_locale_symbol (name));
+                              scm_from_utf8_symbol (name));
 }
 
 SCM
 scm_c_private_variable (const char *module_name, const char *name)
 {
   return scm_private_variable (convert_module_name (module_name),
-                               scm_from_locale_symbol (name));
+                               scm_from_utf8_symbol (name));
 }
 
 SCM
@@ -849,14 +849,14 @@ SCM
 scm_c_public_lookup (const char *module_name, const char *name)
 {
   return scm_public_lookup (convert_module_name (module_name),
-                            scm_from_locale_symbol (name));
+                            scm_from_utf8_symbol (name));
 }
 
 SCM
 scm_c_private_lookup (const char *module_name, const char *name)
 {
   return scm_private_lookup (convert_module_name (module_name),
-                             scm_from_locale_symbol (name));
+                             scm_from_utf8_symbol (name));
 }
 
 SCM
@@ -875,20 +875,20 @@ SCM
 scm_c_public_ref (const char *module_name, const char *name)
 {
   return scm_public_ref (convert_module_name (module_name),
-                         scm_from_locale_symbol (name));
+                         scm_from_utf8_symbol (name));
 }
 
 SCM
 scm_c_private_ref (const char *module_name, const char *name)
 {
   return scm_private_ref (convert_module_name (module_name),
-                          scm_from_locale_symbol (name));
+                          scm_from_utf8_symbol (name));
 }
 
 SCM
 scm_c_module_define (SCM module, const char *name, SCM value)
 {
-  return scm_module_define (module, scm_from_locale_symbol (name), value);
+  return scm_module_define (module, scm_from_utf8_symbol (name), value);
 }
 
 SCM
@@ -907,7 +907,7 @@ scm_module_define (SCM module, SCM sym, SCM value)
 SCM
 scm_c_define (const char *name, SCM value)
 {
-  return scm_define (scm_from_locale_symbol (name), value);
+  return scm_define (scm_from_utf8_symbol (name), value);
 }
 
 SCM_DEFINE (scm_define, "define!", 2, 0, 0,
diff --git a/libguile/options.c b/libguile/options.c
index 286d9e1..8eecd35 100644
--- a/libguile/options.c
+++ b/libguile/options.c
@@ -133,7 +133,7 @@ get_documented_option_setting (const scm_t_option options[])
 
   for (i = 0; options[i].name; ++i)
     {
-      SCM ls = scm_cons (scm_from_locale_string (options[i].doc), SCM_EOL);
+      SCM ls = scm_cons (scm_from_utf8_string (options[i].doc), SCM_EOL);
       switch (options[i].type)
        {
        case SCM_OPTION_BOOLEAN:
@@ -278,7 +278,7 @@ scm_init_opts (SCM (*func) (SCM), scm_t_option options[])
 
   for (i = 0; options[i].name; ++i)
     {
-      SCM name = scm_from_locale_symbol (options[i].name);
+      SCM name = scm_from_utf8_symbol (options[i].name);
       options[i].name =        (char *) SCM_UNPACK (name);
     }
   func (SCM_UNDEFINED);
diff --git a/libguile/print.c b/libguile/print.c
index e462d12..8846cd3 100644
--- a/libguile/print.c
+++ b/libguile/print.c
@@ -432,7 +432,7 @@ scm_i_print_symbol_name (SCM sym, SCM port)
 void
 scm_print_symbol_name (const char *str, size_t len, SCM port)
 {
-  SCM symbol = scm_from_locale_symboln (str, len);
+  SCM symbol = scm_from_utf8_symboln (str, len);
   scm_i_print_symbol_name (symbol, port);
 }
 
diff --git a/libguile/smob.c b/libguile/smob.c
index ad58301..02ad1a5 100644
--- a/libguile/smob.c
+++ b/libguile/smob.c
@@ -438,7 +438,7 @@ scm_i_smob_apply_trampoline (SCM smob)
         name = "smob-apply";
       objtable = scm_c_make_vector (2, SCM_UNDEFINED);
       SCM_SIMPLE_VECTOR_SET (objtable, 0, smob);
-      SCM_SIMPLE_VECTOR_SET (objtable, 1, scm_from_locale_symbol (name));
+      SCM_SIMPLE_VECTOR_SET (objtable, 1, scm_from_utf8_symbol (name));
       tramp = scm_make_program (SCM_SMOB_DESCRIPTOR 
(smob).apply_trampoline_objcode,
                                 objtable, SCM_BOOL_F);
 
diff --git a/libguile/snarf.h b/libguile/snarf.h
index 4aaff7c..3931570 100644
--- a/libguile/snarf.h
+++ b/libguile/snarf.h
@@ -209,11 +209,11 @@ SCM_SNARF_INIT(                                           
                \
 
 # define SCM_SYMBOL(c_name, scheme_name)                               \
 SCM_SNARF_HERE(static SCM c_name)                                      \
-SCM_SNARF_INIT(c_name = scm_from_locale_symbol (scheme_name))
+SCM_SNARF_INIT(c_name = scm_from_utf8_symbol (scheme_name))
 
 # define SCM_GLOBAL_SYMBOL(c_name, scheme_name)                                
\
 SCM_SNARF_HERE(SCM c_name)                                             \
-SCM_SNARF_INIT(c_name = scm_from_locale_symbol (scheme_name))
+SCM_SNARF_INIT(c_name = scm_from_utf8_symbol (scheme_name))
 
 #endif /* !SCM_SUPPORT_STATIC_ALLOCATION */
 
diff --git a/libguile/strings.c b/libguile/strings.c
index d3490a9..b996301 100644
--- a/libguile/strings.c
+++ b/libguile/strings.c
@@ -1446,6 +1446,23 @@ scm_decoding_error (const char *subr, int err, const 
char *message, SCM port)
 
 /* String conversion to/from C.  */
 
+static void
+decoding_error (const char *func_name, int errno_save,
+                const char *str, size_t len)
+{
+  /* Raise an error and pass the raw C string as a bytevector to the `throw'
+     handler.  */
+  SCM bv;
+  signed char *buf;
+
+  buf = scm_gc_malloc_pointerless (len, "bytevector");
+  memcpy (buf, str, len);
+  bv = scm_c_take_gc_bytevector (buf, len, SCM_BOOL_F);
+
+  scm_decoding_error (func_name, errno_save,
+                      "input locale conversion error", bv);
+}
+
 SCM
 scm_from_stringn (const char *str, size_t len, const char *encoding,
                   scm_t_string_failed_conversion_handler handler)
@@ -1481,19 +1498,7 @@ scm_from_stringn (const char *str, size_t len, const 
char *encoding,
                                                 NULL, &u32len);
 
   if (SCM_UNLIKELY (u32 == NULL))
-    {
-      /* Raise an error and pass the raw C string as a bytevector to the 
`throw'
-        handler.  */
-      SCM bv;
-      signed char *buf;
-
-      buf = scm_gc_malloc_pointerless (len, "bytevector");
-      memcpy (buf, str, len);
-      bv = scm_c_take_gc_bytevector (buf, len, SCM_BOOL_F);
-
-      scm_decoding_error (__func__, errno,
-                         "input locale conversion error", bv);
-    }
+    decoding_error (__func__, errno, str, len);
 
   i = 0;
   while (i < u32len)
@@ -1567,7 +1572,81 @@ scm_from_utf8_string (const char *str)
 SCM
 scm_from_utf8_stringn (const char *str, size_t len)
 {
-  return scm_from_stringn (str, len, "UTF-8", SCM_FAILED_CONVERSION_ERROR);
+  size_t i, char_len;
+  const scm_t_uint8 *ustr = (const scm_t_uint8 *) str;
+  int ascii = 1, narrow = 1;
+  SCM res;
+
+  if (len == (size_t) -1)
+    len = strlen (str);
+
+  i = 0;
+  char_len = 0;
+
+  while (i < len)
+    {
+      if (ustr[i] <= 127)
+        {
+          char_len++;
+          i++;
+        }
+      else
+        {
+          ucs4_t c;
+          int nbytes;
+
+          ascii = 0;
+
+          nbytes = u8_mbtouc (&c, ustr + i, len - i);
+
+          if (nbytes < 0)
+            /* Bad UTF-8.  */
+            decoding_error (__func__, errno, str, len);
+
+          if (c > 255)
+            narrow = 0;
+          
+          char_len++;
+          i += nbytes;
+        }
+    }
+  
+  if (ascii)
+    {
+      char *dst;
+      res = scm_i_make_string (char_len, &dst, 0);
+      memcpy (dst, str, len);
+    }
+  else if (narrow)
+    {
+      char *dst;
+      size_t j;
+      ucs4_t c;
+
+      res = scm_i_make_string (char_len, &dst, 0);
+
+      for (i = 0, j = 0; i < len; i++, j++)
+        {
+          i += u8_mbtouc_unsafe (&c, ustr + i, len - i);
+          dst[j] = (signed char) c;
+        }
+    }
+  else
+    {
+      scm_t_wchar *dst;
+      size_t j;
+      ucs4_t c;
+
+      res = scm_i_make_wide_string (char_len, &dst, 0);
+
+      for (i = 0, j = 0; i < len; i++, j++)
+        {
+          i += u8_mbtouc_unsafe (&c, ustr + i, len - i);
+          dst[j] = c;
+        }
+    }
+
+  return res;
 }
 
 SCM
diff --git a/libguile/symbols.c b/libguile/symbols.c
index 498e46c..9cb300a 100644
--- a/libguile/symbols.c
+++ b/libguile/symbols.c
@@ -23,6 +23,8 @@
 #  include <config.h>
 #endif
 
+#include <unistr.h>
+
 #include "libguile/_scm.h"
 #include "libguile/chars.h"
 #include "libguile/eval.h"
@@ -144,6 +146,73 @@ lookup_interned_latin1_symbol (const char *str, size_t len,
                                 &data, SCM_BOOL_F);
 }
 
+struct utf8_lookup_data
+{
+  const char *str;
+  size_t len;
+  unsigned long string_hash;
+};
+
+static int
+utf8_string_equals_wide_string (const scm_t_uint8 *narrow, size_t nlen,
+                                const scm_t_wchar *wide, size_t wlen)
+{
+  size_t byte_idx = 0, char_idx = 0;
+  
+  while (byte_idx < nlen && char_idx < wlen)
+    {
+      ucs4_t c;
+      int nbytes;
+
+      nbytes = u8_mbtouc (&c, narrow + byte_idx, nlen - byte_idx);
+      if (nbytes == 0)
+        break;
+      else if (nbytes < 0)
+        /* Bad UTF-8.  */
+        return 0;
+      else if (c != wide[char_idx])
+        return 0;
+
+      byte_idx += nbytes;
+      char_idx++;
+    }
+
+  return byte_idx == nlen && char_idx == wlen;
+}
+
+static int
+utf8_lookup_predicate_fn (SCM sym, void *closure)
+{
+  struct utf8_lookup_data *data = closure;
+
+  if (scm_i_symbol_hash (sym) != data->string_hash)
+    return 0;
+  
+  if (scm_i_is_narrow_symbol (sym))
+    return (scm_i_symbol_length (sym) == data->len
+            && strncmp (scm_i_symbol_chars (sym), data->str, data->len) == 0);
+  else
+    return utf8_string_equals_wide_string ((const scm_t_uint8 *) data->str,
+                                           data->len,
+                                           scm_i_symbol_wide_chars (sym),
+                                           scm_i_symbol_length (sym));
+}
+
+static SCM
+lookup_interned_utf8_symbol (const char *str, size_t len,
+                             unsigned long raw_hash)
+{
+  struct utf8_lookup_data data;
+
+  data.str = str;
+  data.len = len;
+  data.string_hash = raw_hash;
+  
+  return scm_c_weak_set_lookup (symbols, raw_hash,
+                                utf8_lookup_predicate_fn,
+                                &data, SCM_BOOL_F);
+}
+
 static int
 symbol_lookup_predicate_fn (SCM sym, void *closure)
 {
@@ -459,8 +528,21 @@ scm_from_utf8_symbol (const char *sym)
 SCM
 scm_from_utf8_symboln (const char *sym, size_t len)
 {
-  SCM str = scm_from_utf8_stringn (sym, len);
-  return scm_i_str2symbol (str);
+  unsigned long hash;
+  SCM ret;
+
+  if (len == (size_t) -1)
+    len = strlen (sym);
+  hash = scm_i_utf8_string_hash (sym, len);
+
+  ret = lookup_interned_utf8_symbol (sym, len, hash);
+  if (scm_is_false (ret))
+    {
+      SCM str = scm_from_utf8_stringn (sym, len);
+      ret = scm_i_str2symbol (str);
+    }
+
+  return ret;
 }
 
 void
diff --git a/libguile/symbols.h b/libguile/symbols.h
index 94d3003..f345e70 100644
--- a/libguile/symbols.h
+++ b/libguile/symbols.h
@@ -90,7 +90,7 @@ SCM_API SCM scm_take_utf8_symboln (char *sym, size_t len);
 /* internal functions. */
 
 SCM_INTERNAL unsigned long scm_i_hash_symbol (SCM obj, unsigned long n,
-                                        void *closure);
+                                              void *closure);
 
 SCM_INTERNAL void scm_symbols_prehistory (void);
 SCM_INTERNAL void scm_init_symbols (void);
diff --git a/test-suite/tests/vlist.test b/test-suite/tests/vlist.test
index d939284..d9bbbeb 100644
--- a/test-suite/tests/vlist.test
+++ b/test-suite/tests/vlist.test
@@ -220,11 +220,11 @@
       (and (fold (lambda (k v result)
                    (and result
                         (equal? (cons k v)
-                                (vhash-assq k vh))))
+                                (vhash-assoc k vh))))
                  #t
                  keys
                  values)
-           (not (vhash-assq 'x vh)))))
+           (not (vhash-assoc 'x vh)))))
 
   (pass-if "vhash as vlist"
     (let* ((keys   '(a b c d e f g h i))


hooks/post-receive
-- 
GNU Guile



reply via email to

[Prev in Thread] Current Thread [Next in Thread]