guile-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Guile-commits] GNU Guile branch, stable-2.0, updated. v2.0.0-174-g2e9fc


From: Andy Wingo
Subject: [Guile-commits] GNU Guile branch, stable-2.0, updated. v2.0.0-174-g2e9fc9f
Date: Mon, 11 Apr 2011 11:47:53 +0000

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU Guile".

http://git.savannah.gnu.org/cgit/guile.git/commit/?id=2e9fc9fc73a8157152e6b2e122ec545d96478c2a

The branch, stable-2.0 has been updated
       via  2e9fc9fc73a8157152e6b2e122ec545d96478c2a (commit)
       via  d9527cfafdad1046770437a7a59d3745e7243c67 (commit)
       via  15671c6e7fd86160b415b5373b2c1539e23556f3 (commit)
       via  adf43b3f081878860ed1d4d5091b9a432b44da90 (commit)
      from  882c89636a2a4afa26cff17c7cdbc1d8c1cb2745 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 2e9fc9fc73a8157152e6b2e122ec545d96478c2a
Author: Andy Wingo <address@hidden>
Date:   Mon Apr 11 13:38:27 2011 +0200

    symbols with odd characters print better in #{}#
    
    * libguile/print.c (symbol_has_extended_read_syntax): Use a more
      general, unicode-appropriate algorithm.  Hopefully doesn't cause
      any current #{}# cases to be unescaped.
      (print_extended_symbol): Use more appropriate unicode algorithm, and
      emit unicode hex escapes instead of our own lame escapes.
    
    * test-suite/tests/symbols.test: Add tests.

commit d9527cfafdad1046770437a7a59d3745e7243c67
Author: Andy Wingo <address@hidden>
Date:   Mon Apr 11 12:48:06 2011 +0200

    read-extended-symbol handles backslash better, including r6rs hex escapes
    
    * libguile/read.c (scm_read_extended_symbol): Interpret '\' as an escape
      character.  Due to some historical oddities we have to support '\'
      before any character, but since we never emitted '\' in front of
      "normal" characters like 'x' we can interpret "\x..;" to be an R6RS
      hex escape.
    
    * test-suite/tests/reader.test ("#{}#"): Add tests.

commit 15671c6e7fd86160b415b5373b2c1539e23556f3
Author: Andy Wingo <address@hidden>
Date:   Mon Apr 11 11:52:35 2011 +0200

    refactor scm_i_print_symbol_name
    
    * libguile/print.c (symbol_has_extended_read_syntax)
      (print_normal_symbol, print_extended_symbol, scm_i_print_symbol_name):
      Factor scm_i_print_symbol_name into separate routines.  Add comments.
      There are a number of bugs here.

commit adf43b3f081878860ed1d4d5091b9a432b44da90
Author: Andy Wingo <address@hidden>
Date:   Mon Apr 11 10:13:48 2011 +0200

    ignore SIGPIPE in (system repl server)
    
    * module/system/repl/server.scm (run-server): Ignore SIGPIPE when we run
      a server, as otherwise a rudely disconnected client could cause the
      server to quit.  Thanks to John Proctor for the report, and Detlev
      Zundel for the debugging.

-----------------------------------------------------------------------

Summary of changes:
 libguile/print.c              |  194 ++++++++++++++++++++++-------------------
 libguile/read.c               |   55 ++++++++++--
 module/system/repl/server.scm |    3 +-
 test-suite/tests/reader.test  |   12 +++
 test-suite/tests/symbols.test |    7 ++-
 5 files changed, 173 insertions(+), 98 deletions(-)

diff --git a/libguile/print.c b/libguile/print.c
index e3c9e1c..1399566 100644
--- a/libguile/print.c
+++ b/libguile/print.c
@@ -309,15 +309,10 @@ print_circref (SCM port, scm_print_state *pstate, SCM ref)
 /* Print the name of a symbol. */
 
 static int
-quote_keywordish_symbol (SCM symbol)
+quote_keywordish_symbols (void)
 {
-  SCM option;
+  SCM option = SCM_PRINT_KEYWORD_STYLE;
 
-  if (scm_i_symbol_ref (symbol, 0) != ':'
-      && scm_i_symbol_ref (symbol, scm_i_symbol_length (symbol) - 1) !=  ':')
-    return 0;
-
-  option = SCM_PRINT_KEYWORD_STYLE;
   if (scm_is_false (option))
     return 0;
   if (scm_is_eq (option, sym_reader))
@@ -325,91 +320,114 @@ quote_keywordish_symbol (SCM symbol)
   return 1;
 }
 
-void
-scm_i_print_symbol_name (SCM str, SCM port)
+#define INITIAL_IDENTIFIER_MASK                                      \
+  (UC_CATEGORY_MASK_Lu | UC_CATEGORY_MASK_Ll | UC_CATEGORY_MASK_Lt   \
+   | UC_CATEGORY_MASK_Lm | UC_CATEGORY_MASK_Lo | UC_CATEGORY_MASK_Mn \
+   | UC_CATEGORY_MASK_Nl | UC_CATEGORY_MASK_No | UC_CATEGORY_MASK_Pd \
+   | UC_CATEGORY_MASK_Pc | UC_CATEGORY_MASK_Po | UC_CATEGORY_MASK_Sc \
+   | UC_CATEGORY_MASK_Sm | UC_CATEGORY_MASK_Sk | UC_CATEGORY_MASK_So \
+   | UC_CATEGORY_MASK_Co)
+
+#define SUBSEQUENT_IDENTIFIER_MASK                                      \
+  (INITIAL_IDENTIFIER_MASK                                              \
+   | UC_CATEGORY_MASK_Nd | UC_CATEGORY_MASK_Mc | UC_CATEGORY_MASK_Me)
+
+static int
+symbol_has_extended_read_syntax (SCM sym)
 {
-  /* This points to the first character that has not yet been written to the
-   * port. */
-  size_t pos = 0;
-  /* This points to the character we're currently looking at. */
-  size_t end;
-  /* If the name contains weird characters, we'll escape them with
-   * backslashes and set this flag; it indicates that we should surround the
-   * name with "#{" and "}#". */
-  int weird = 0;
-  /* Backslashes are not sufficient to make a name weird, but if a name is
-   * weird because of other characters, backslahes need to be escaped too.
-   * The first time we see a backslash, we set maybe_weird, and mw_pos points
-   * to the backslash.  Then if the name turns out to be weird, we re-process
-   * everything starting from mw_pos.
-   * We could instead make backslashes always weird.  This is not necessary
-   * to ensure that the output is (read)-able, but it would make this code
-   * simpler and faster. */
-  int maybe_weird = 0;
-  size_t mw_pos = 0;
-  size_t len = scm_i_symbol_length (str);
-  scm_t_wchar str0 = scm_i_symbol_ref (str, 0);
-
-  if (len == 0 || str0 == '\'' || str0 == '`' || str0 == ','
-      || quote_keywordish_symbol (str) 
-      || (str0 == '.' && len == 1)
-      || scm_is_true (scm_i_string_to_number (scm_symbol_to_string (str), 10)))
+  size_t pos, len = scm_i_symbol_length (sym);
+  scm_t_wchar c;
+
+  /* The empty symbol.  */
+  if (len == 0)
+    return 1;
+
+  c = scm_i_symbol_ref (sym, 0);
+
+  /* Single dot; conflicts with dotted-pair notation.  */
+  if (len == 1 && c == '.')
+    return 1;
+
+  /* Other initial-character constraints.  */
+  if (c == '\'' || c == '`' || c == ',' || c == '"' || c == ';' || c == '#')
+    return 1;
+  
+  /* Keywords can be identified by trailing colons too.  */
+  if (c == ':' || scm_i_symbol_ref (sym, len - 1) == ':')
+    return quote_keywordish_symbols ();
+  
+  /* Number-ish symbols.  */
+  if (scm_is_true (scm_i_string_to_number (scm_symbol_to_string (sym), 10)))
+    return 1;
+  
+  /* Other disallowed first characters.  */
+  if (!uc_is_general_category_withtable (c, INITIAL_IDENTIFIER_MASK))
+    return 1;
+
+  /* Otherwise, any character that's in the identifier category mask is
+     fine to pass through as-is, provided it's not one of the ASCII
+     delimiters like `;'.  */
+  for (pos = 1; pos < len; pos++)
     {
-      scm_lfwrite ("#{", 2, port);
-      weird = 1;
+      c = scm_i_symbol_ref (sym, pos);
+      if (!uc_is_general_category_withtable (c, SUBSEQUENT_IDENTIFIER_MASK))
+        return 1;
+      else if (c == '"' || c == ';' || c == '#')
+        return 1;
     }
 
-  for (end = pos; end < len; ++end)
-    switch (scm_i_symbol_ref (str, end))
-      {
-#ifdef BRACKETS_AS_PARENS
-      case '[':
-      case ']':
-#endif
-      case '(':
-      case ')':
-      case '"':
-      case ';':
-      case '#':
-      case SCM_WHITE_SPACES:
-      case SCM_LINE_INCREMENTORS:
-      weird_handler:
-       if (maybe_weird)
-         {
-           end = mw_pos;
-           maybe_weird = 0;
-         }
-       if (!weird)
-         {
-           scm_lfwrite ("#{", 2, port);
-           weird = 1;
-         }
-       if (pos < end)
-         scm_lfwrite_substr (scm_symbol_to_string (str), pos, end, port);
-       {
-         char buf[2];
-         buf[0] = '\\';
-         buf[1] = (char) (unsigned char) scm_i_symbol_ref (str, end);
-         scm_lfwrite (buf, 2, port);
-       }
-       pos = end + 1;
-       break;
-      case '\\':
-       if (weird)
-         goto weird_handler;
-       if (!maybe_weird)
-         {
-           maybe_weird = 1;
-           mw_pos = pos;
-         }
-       break;
-      default:
-       break;
-      }
-  if (pos < end)
-    scm_lfwrite_substr (scm_symbol_to_string (str), pos, end, port);
-  if (weird)
-    scm_lfwrite ("}#", 2, port);
+  return 0;
+}
+
+static void
+print_normal_symbol (SCM sym, SCM port)
+{
+  scm_display (scm_symbol_to_string (sym), port);
+}
+
+static void
+print_extended_symbol (SCM sym, SCM port)
+{
+  size_t pos, len;
+  scm_t_string_failed_conversion_handler strategy;
+
+  len = scm_i_symbol_length (sym);
+  strategy = scm_i_get_conversion_strategy (port);
+
+  scm_lfwrite ("#{", 2, port);
+
+  for (pos = 0; pos < len; pos++)
+    {
+      scm_t_wchar c = scm_i_symbol_ref (sym, pos);
+      
+      if (uc_is_general_category_withtable (c,
+                                            SUBSEQUENT_IDENTIFIER_MASK
+                                            | UC_CATEGORY_MASK_Zs))
+        {
+          if (!display_character (c, port, strategy))
+            scm_encoding_error ("print_extended_symbol", errno,
+                                "cannot convert to output locale",
+                                port, SCM_MAKE_CHAR (c));
+        }
+      else
+        {
+          display_string ("\\x", 1, 2, port, iconveh_question_mark);
+          scm_intprint (c, 16, port);
+          display_character (';', port, iconveh_question_mark);
+        }
+    }
+
+  scm_lfwrite ("}#", 2, port);
+}
+
+/* FIXME: allow R6RS hex escapes instead of #{...}#.  */
+void
+scm_i_print_symbol_name (SCM sym, SCM port)
+{
+  if (symbol_has_extended_read_syntax (sym))
+    print_extended_symbol (sym, port);
+  else
+    print_normal_symbol (sym, port);
 }
 
 void
diff --git a/libguile/read.c b/libguile/read.c
index a05a86d..4b6828b 100644
--- a/libguile/read.c
+++ b/libguile/read.c
@@ -1230,7 +1230,7 @@ scm_read_extended_symbol (scm_t_wchar chr, SCM port)
        #{This is all a symbol name}#
 
      So here, CHR is expected to be `{'.  */
-  int saw_brace = 0, finished = 0;
+  int saw_brace = 0;
   size_t len = 0;
   SCM buf = scm_i_make_string (1024, NULL, 0);
 
@@ -1242,20 +1242,57 @@ scm_read_extended_symbol (scm_t_wchar chr, SCM port)
        {
          if (chr == '#')
            {
-             finished = 1;
              break;
            }
          else
            {
              saw_brace = 0;
              scm_i_string_set_x (buf, len++, '}');
-             scm_i_string_set_x (buf, len++, chr);
            }
        }
-      else if (chr == '}')
+
+      if (chr == '}')
        saw_brace = 1;
+      else if (chr == '\\')
+        {
+          /* It used to be that print.c would print extended-read-syntax
+             symbols with backslashes before "non-standard" chars, but
+             this routine wouldn't do anything with those escapes.
+             Bummer.  What we've done is to change print.c to output
+             R6RS hex escapes for those characters, relying on the fact
+             that the extended read syntax would never put a `\' before
+             an `x'.  For now, we just ignore other instances of
+             backslash in the string.  */
+          switch ((chr = scm_getc (port)))
+            {
+            case EOF:
+              goto done;
+            case 'x':
+              {
+                scm_t_wchar c;
+                
+                SCM_READ_HEX_ESCAPE (10, ';');
+                scm_i_string_set_x (buf, len++, c);
+                break;
+
+              str_eof:
+                chr = EOF;
+                goto done;
+
+              bad_escaped:
+                scm_i_string_stop_writing ();
+                scm_i_input_error ("scm_read_extended_symbol", port,
+                                   "illegal character in escape sequence: ~S",
+                                   scm_list_1 (SCM_MAKE_CHAR (c)));
+                break;
+              }
+            default:
+             scm_i_string_set_x (buf, len++, chr);
+              break;
+            }
+        }
       else
-       scm_i_string_set_x (buf, len++, chr);
+        scm_i_string_set_x (buf, len++, chr);
 
       if (len >= scm_i_string_length (buf) - 2)
        {
@@ -1267,11 +1304,13 @@ scm_read_extended_symbol (scm_t_wchar chr, SCM port)
          len = 0;
          buf = scm_i_string_start_writing (buf);
        }
-
-      if (finished)
-       break;
     }
+
+ done:
   scm_i_string_stop_writing ();
+  if (chr == EOF)
+    scm_i_input_error ("scm_read_extended_symbol", port,
+                       "end of file while reading symbol", SCM_EOL);
 
   return (scm_string_to_symbol (scm_c_substring (buf, 0, len)));
 }
diff --git a/module/system/repl/server.scm b/module/system/repl/server.scm
index 132ea81..ec90677 100644
--- a/module/system/repl/server.scm
+++ b/module/system/repl/server.scm
@@ -1,6 +1,6 @@
 ;;; Repl server
 
-;; Copyright (C)  2003, 2010 Free Software Foundation, Inc.
+;; Copyright (C)  2003, 2010, 2011 Free Software Foundation, Inc.
 
 ;; This library is free software; you can redistribute it and/or
 ;; modify it under the terms of the GNU Lesser General Public
@@ -103,6 +103,7 @@
           (sleep 1)
           (accept-new-client))))))
   
+  (sigaction SIGPIPE SIG_IGN)
   (add-open-socket! server-socket)
   (listen server-socket 5)
   (let lp ((client (accept-new-client)))
diff --git a/test-suite/tests/reader.test b/test-suite/tests/reader.test
index 1d6cc41..7027d32 100644
--- a/test-suite/tests/reader.test
+++ b/test-suite/tests/reader.test
@@ -36,6 +36,8 @@
   (cons 'read-error "Unknown # object: .*$"))
 (define exception:eof-in-string
   (cons 'read-error "end of file in string constant$"))
+(define exception:eof-in-symbol
+  (cons 'read-error "end of file while reading symbol$"))
 (define exception:illegal-escape
   (cons 'read-error "illegal character in escape sequence: .*$"))
 (define exception:missing-expression
@@ -424,6 +426,16 @@
      ("#,foo" . (unsyntax foo))
      ("#,@foo" . (unsyntax-splicing foo)))))
 
+(with-test-prefix "#{}#"
+  (pass-if (equal? (read-string "#{}#") '#{}#))
+  (pass-if (equal? (read-string "#{a}#") 'a))
+  (pass-if (equal? (read-string "#{a b}#") '#{a b}#))
+  (begin-deprecated
+   (pass-if (equal? (read-string "#{a\\ b}#") '#{a b}#)))
+  (pass-if-exception "#{" exception:eof-in-symbol
+                     (read-string "#{"))
+  (pass-if (equal? (read-string "#{a\\x20;b}#") '#{a b}#)))
+
 
 ;;; Local Variables:
 ;;; eval: (put 'with-read-options 'scheme-indent-function 1)
diff --git a/test-suite/tests/symbols.test b/test-suite/tests/symbols.test
index c87aa21..6fbc6be 100644
--- a/test-suite/tests/symbols.test
+++ b/test-suite/tests/symbols.test
@@ -1,6 +1,6 @@
 ;;;; symbols.test --- test suite for Guile's symbols    -*- scheme -*-
 ;;;;
-;;;; Copyright (C) 2001, 2006, 2008, 2009 Free Software Foundation, Inc.
+;;;; Copyright (C) 2001, 2006, 2008, 2009, 2011 Free Software Foundation, Inc.
 ;;;; 
 ;;;; This library is free software; you can redistribute it and/or
 ;;;; modify it under the terms of the GNU Lesser General Public
@@ -151,3 +151,8 @@
   (pass-if "accepts embedded NULs"
     (> (string-length (symbol->string (gensym 
"foo\0bar\0braz\0foo\0bar\0braz\0foo\0bar\0braz\0foo\0bar\0braz\0foo\0bar\0braz\0foo\0bar\0braz\0")))
 6)))
 
+(with-test-prefix "extended read syntax"
+  (pass-if (equal? "#{}#" (object->string (string->symbol ""))))
+  (pass-if (equal? "a" (object->string (string->symbol "a"))))
+  (pass-if (equal? "#{a b}#" (object->string (string->symbol "a b"))))
+  (pass-if (equal? "#{\\x7d;}#" (object->string (string->symbol "}")))))


hooks/post-receive
-- 
GNU Guile



reply via email to

[Prev in Thread] Current Thread [Next in Thread]