guile-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Guile-commits] GNU Guile branch, master, updated. release_1-9-2-157-g28


From: Michael Gran
Subject: [Guile-commits] GNU Guile branch, master, updated. release_1-9-2-157-g28cc8da
Date: Fri, 04 Sep 2009 14:57:37 +0000

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU Guile".

http://git.savannah.gnu.org/cgit/guile.git/commit/?id=28cc8dac2f520fa9de29e93dca52e4892b945a3c

The branch, master has been updated
       via  28cc8dac2f520fa9de29e93dca52e4892b945a3c (commit)
       via  18d8fcd43c8ea6b0122453b2d9f7ac10c1f36d6c (commit)
       via  25ebc0340d30d1ceb786dbc8c3fe80c6e9ae0e87 (commit)
       via  3d03f9395e2ca83183e846ee99d4f9e541771c20 (commit)
      from  5f5e7a2cd6db0a7068f00710d0cca340c043c0ea (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 28cc8dac2f520fa9de29e93dca52e4892b945a3c
Author: Michael Gran <address@hidden>
Date:   Fri Sep 4 07:55:05 2009 -0700

    Doc updates for Unicode string escapes and port encodings
    
    * NEWS: string and port changes
    
    * doc/ref/api-data.texi: string escapes and string-ci
    
    * doc/ref/api-io.texi: port encoding functions

commit 18d8fcd43c8ea6b0122453b2d9f7ac10c1f36d6c
Author: Michael Gran <address@hidden>
Date:   Fri Sep 4 07:34:35 2009 -0700

    Remove locale u8vector functions
    
    Locale u8vector functions deemed harmful.
    
    * libguile/strports.c (scm_strport_to_locale_u8vector)
      (scm_call_with_output_locale_u8vector, scm_open_input_locale_u8vector)
      (scm_get_output_locale_u8vector): removed
    
    * libguile/strports.h: removed declarations for
      scm_strport_to_locale_u8vector,
      scm_call_with_output_u8vector,
      scm_input_locale_u8vector,
      scm_get_output_locale_u8vector
    
    * test-suite/tests/encoding-iso88591.test: display tests removed
    
    * test-suite/tests/encoding-iso88597.test: display tests removed

commit 25ebc0340d30d1ceb786dbc8c3fe80c6e9ae0e87
Author: Michael Gran <address@hidden>
Date:   Fri Sep 4 07:30:13 2009 -0700

    Initialize string ports with UTF-8 encoding
    
    String ports should be able to accept any string characters, regardless
    of the current locale.  Setting it to UTF-8 achieves that.
    
    * libguile/strports.c (scm_i_mkstrport): set port's locale to UTF-8
      (scm_mkstrport): convert input string to UTF-8

commit 3d03f9395e2ca83183e846ee99d4f9e541771c20
Author: Michael Gran <address@hidden>
Date:   Fri Sep 4 07:27:14 2009 -0700

    write-char should handle UCS-4 characters
    
    * libguile/print.c (scm_write_char): call UCS-4 printing routine, instead
      of 8-bit primitive

-----------------------------------------------------------------------

Summary of changes:
 NEWS                                    |   11 ++++
 doc/ref/api-data.texi                   |   19 ++++++-
 doc/ref/api-io.texi                     |   84 +++++++++++++++++++++++++++---
 libguile/print.c                        |    4 +-
 libguile/strports.c                     |   85 +++----------------------------
 libguile/strports.h                     |    4 --
 test-suite/tests/encoding-iso88591.test |   21 --------
 test-suite/tests/encoding-iso88597.test |   21 --------
 8 files changed, 113 insertions(+), 136 deletions(-)

diff --git a/NEWS b/NEWS
index 955075b..a3c4ddd 100644
--- a/NEWS
+++ b/NEWS
@@ -10,6 +10,17 @@ prerelease, and a full NEWS corresponding to 1.8 -> 2.0.)
 
 Changes in 1.9.3 (since the 1.9.2 prerelease):
 
+** Ports do transcoding
+
+Ports now have an associated character encoding, and port read/write
+operations do conversion to/from locales automatically.  Ports also
+have an associated strategy for how to deal with locale conversion
+failures.  Four functions to support this: set-port-encoding!,
+port-encoding, set-port-conversion-strategy!,
+port-conversion-strategy.
+
+** String and SRFI-13 functions can operate on Unicode strings
+
 ** SRFI-14 char-sets are modified for Unicode
 
 The default char-sets are not longer locale dependent and contain
diff --git a/doc/ref/api-data.texi b/doc/ref/api-data.texi
index 5cbf4b1..cf0d321 100755
--- a/doc/ref/api-data.texi
+++ b/doc/ref/api-data.texi
@@ -2690,6 +2690,14 @@ Vertical tab character (ASCII 11).
 @item @nicode{\xHH}
 Character code given by two hexadecimal digits.  For example
 @nicode{\x7f} for an ASCII DEL (127).
+
address@hidden @nicode{\uHHHH}
+Character code given by four hexadecimal digits.  For example
address@hidden for a capital A with macron (U+0100).
+
address@hidden @nicode{\UHHHHHH}
+Character code given by six hexadecimal digits.  For example
address@hidden
 @end table
 
 @noindent
@@ -3110,9 +3118,14 @@ The procedures in this section are similar to the 
character ordering
 predicates (@pxref{Characters}), but are defined on character sequences.
 
 The first set is specified in R5RS and has names that end in @code{?}.
-The second set is specified in SRFI-13 and the names have no ending
address@hidden  The predicates ending in @code{-ci} ignore the character case
-when comparing strings.  @xref{Text Collation, the @code{(ice-9
+The second set is specified in SRFI-13 and the names have not ending
address@hidden  
+
+The predicates ending in @code{-ci} ignore the character case
+when comparing strings.  For now, case-insensitive comparison is done
+using the R5RS rules, where every lower-case character that has a
+single character upper-case form is converted to uppercase before
+comparison.  See @xref{Text Collation, the @code{(ice-9
 i18n)} module}, for locale-dependent string comparison.
 
 @rnindex string=?
diff --git a/doc/ref/api-io.texi b/doc/ref/api-io.texi
index 96cd147..83a2fd7 100644
--- a/doc/ref/api-io.texi
+++ b/doc/ref/api-io.texi
@@ -47,7 +47,7 @@ are two interesting and powerful examples of this technique.
 
 Ports are garbage collected in the usual way (@pxref{Memory
 Management}), and will be closed at that time if not already closed.
-In this case any errors occuring in the close will not be reported.
+In this case any errors occurring in the close will not be reported.
 Usually a program will want to explicitly close so as to be sure all
 its operations have been successful.  Of course if a program has
 abandoned something due to an error or other condition then closing
@@ -70,6 +70,18 @@ All file access uses the ``LFS'' large file support 
functions when
 available, so files bigger than 2 Gbytes (@math{2^31} bytes) can be
 read and written on a 32-bit system.
 
+Each port has an associated character encoding that controls how bytes
+read from the port are converted to characters and string and controls
+how characters and strings written to the port are converted to bytes.
+When ports are created, they inherit their character encoding from the
+current locale, but, that can be modified after the port is created.
+
+Each port also has an associated conversion strategy: what to do when
+a Guile character can't be converted to the port's encoded character
+representation for output. There are three possible strategies: to
+raise an error, to replace the character with a hex escape, or to
+replace the character with a substitute character.
+
 @rnindex input-port?
 @deffn {Scheme Procedure} input-port? x
 @deffnx {C Function} scm_input_port_p (x)
@@ -93,6 +105,55 @@ Equivalent to @code{(or (input-port? @var{x}) (output-port?
 @var{x}))}.
 @end deffn
 
address@hidden {Scheme Procedure} set-port-encoding! port enc
address@hidden {C Function} scm_set_port_encoding_x (port, enc)
+Sets the character encoding that will be used to interpret all port
+I/O.  @var{enc} is a string containing the name of an encoding.  
address@hidden deffn
+
+New ports are created with the encoding appropriate for the current
+locale if @code{setlocale} has been called or ISO-8859-1 otherwise,
+and this procedure can be used to modify that encoding.
+
address@hidden {Scheme Procedure} port-encoding port
address@hidden {C Function} scm_port_encoding
+Returns, as a string, the character encoding that @var{port} uses to
+interpret its input and output.
address@hidden deffn
+
address@hidden {Scheme Procedure} set-port-conversion-strategy! port sym
address@hidden {C Function} scm_set_port_conversion_strategy_x (port, sym)
+Sets the behavior of the interpreter when outputting a character that
+is not representable in the port's current encoding.  @var{sym} can be
+either @code{'error}, @code{'substitute}, or @code{'escape}.  If it is
address@hidden'error}, an error will be thrown when an nonconvertible character
+is encountered.  If it is @code{'substitute}, then nonconvertible
+characters will be replaced with approximate characters, or with
+question marks if no approximately correct character is available.  If
+it is @code{'escape}, it will appear as a hex escape when output.
+
+If @var{port} is an open port, the conversion error behavior
+is set for that port.  If it is @code{#f}, it is set as the
+default behavior for any future ports that get created in
+this thread.
address@hidden deffn
+
address@hidden {Scheme Procedure} port-conversion-strategy port
address@hidden {C Function} scm_port_conversion_strategy (port)
+Returns the behavior of the port when outputting a character that is
+not representable in the port's current encoding.  It returns the
+symbol @code{error} if unrepresentable characters should cause
+exceptions, @code{substitute} if the port should try to replace
+unrepresentable characters with question marks or approximate
+characters, or @code{escape} if unrepresentable characters should be
+converted to string escapes.
+
+If @var{port} is @code{#f}, then the current default behavior will be
+returned.  New ports will have this default behavior when they are
+created.
address@hidden deffn
+
+
 
 @node Reading
 @subsection Reading
@@ -238,7 +299,7 @@ output port if not given.
 
 The output is designed to be machine readable, and can be read back
 with @code{read} (@pxref{Reading}).  Strings are printed in
-doublequotes, with escapes if necessary, and characters are printed in
+double quotes, with escapes if necessary, and characters are printed in
 @samp{#\} notation.
 @end deffn
 
@@ -248,7 +309,7 @@ Send a representation of @var{obj} to @var{port} or to the 
current
 output port if not given.
 
 The output is designed for human readability, it differs from
address@hidden in that strings are printed without doublequotes and
address@hidden in that strings are printed without double quotes and
 escapes, and characters are printed as per @code{write-char}, not in
 @samp{#\} form.
 @end deffn
@@ -496,7 +557,7 @@ used.  This function is equivalent to:
 @end lisp
 @end deffn
 
-Some of the abovementioned I/O functions rely on the following C
+Some of the aforementioned I/O functions rely on the following C
 primitives.  These will mainly be of interest to people hacking Guile
 internals.
 
@@ -815,11 +876,11 @@ Open @var{filename} for output.  Equivalent to
 Open @var{filename} for input or output, and call @code{(@var{proc}
 port)} with the resulting port.  Return the value returned by
 @var{proc}.  @var{filename} is opened as per @code{open-input-file} or
address@hidden respectively, and an error is signalled if it
address@hidden respectively, and an error is signaled if it
 cannot be opened.
 
 When @var{proc} returns, the port is closed.  If @var{proc} does not
-return (eg.@: if it throws an error), then the port might not be
+return (e.g.@: if it throws an error), then the port might not be
 closed automatically, though it will be garbage collected in the usual
 way if not otherwise referenced.
 @end deffn
@@ -834,7 +895,7 @@ setup as respectively the @code{current-input-port},
 @code{current-output-port}, or @code{current-error-port}.  Return the
 value returned by @var{thunk}.  @var{filename} is opened as per
 @code{open-input-file} or @code{open-output-file} respectively, and an
-error is signalled if it cannot be opened.
+error is signaled if it cannot be opened.
 
 When @var{thunk} returns, the port is closed and the previous setting
 of the respective current port is restored.
@@ -891,6 +952,13 @@ Determine whether @var{obj} is a port that is related to a 
file.
 The following allow string ports to be opened by analogy to R4R*
 file port facilities:
 
+With string ports, the port-encoding is treated differently than other
+types of ports.  When string ports are created, they do not inherit a
+character encoding from the current locale.  They are given a
+default locale that allows them to handle all valid string characters.
+Typically one should not modify a string port's character encoding
+away from its default.
+
 @deffn {Scheme Procedure} call-with-output-string proc
 @deffnx {C Function} scm_call_with_output_string (proc)
 Calls the one-argument procedure @var{proc} with a newly created output
@@ -1409,7 +1477,7 @@ is set.
 
 @node Port Implementation
 @subsubsection Port Implementation
address@hidden Port implemenation
address@hidden Port implementation
 
 This section describes how to implement a new port type in C.
 
diff --git a/libguile/print.c b/libguile/print.c
index f4826d4..23e48e3 100644
--- a/libguile/print.c
+++ b/libguile/print.c
@@ -1216,8 +1216,8 @@ SCM_DEFINE (scm_write_char, "write-char", 1, 1, 0,
 
   SCM_VALIDATE_CHAR (1, chr);
   SCM_VALIDATE_OPORT_VALUE (2, port);
-
-  scm_putc ((int) SCM_CHAR (chr), SCM_COERCE_OUTPORT (port));
+  
+  scm_i_charprint (SCM_CHAR (chr), SCM_COERCE_OUTPORT (port));
 #if 0
 #ifdef HAVE_PIPE
 # ifdef EPIPE
diff --git a/libguile/strports.c b/libguile/strports.c
index 5bfeaad..82895ac 100644
--- a/libguile/strports.c
+++ b/libguile/strports.c
@@ -290,7 +290,7 @@ st_truncate (SCM port, scm_t_off length)
 }
 
 SCM 
-scm_i_mkstrport (SCM pos, const char *locale_str, size_t str_len, long modes, 
const char *caller)
+scm_i_mkstrport (SCM pos, const char *utf8_str, size_t str_len, long modes, 
const char *caller)
 {
   SCM z, str;
   scm_t_port *pt;
@@ -305,7 +305,7 @@ scm_i_mkstrport (SCM pos, const char *locale_str, size_t 
str_len, long modes, co
 
      locale_str is already in the locale of the port.  */
   str = scm_i_make_string (str_len, &buf);
-  memcpy (buf, locale_str, str_len);
+  memcpy (buf, utf8_str, str_len);
 
   c_pos = scm_to_unsigned_integer (pos, 0, str_len);
 
@@ -323,12 +323,14 @@ scm_i_mkstrport (SCM pos, const char *locale_str, size_t 
str_len, long modes, co
   pt->write_end = pt->read_end = pt->read_buf + pt->read_buf_size;
 
   pt->rw_random = 1;
-
   scm_i_pthread_mutex_unlock (&scm_i_port_table_mutex);
 
   /* ensure write_pos is writable. */
   if ((modes & SCM_WRTNG) && pt->write_pos == pt->write_end)
     st_flush (z);
+
+  scm_i_set_port_encoding_x (z, "UTF-8");
+  scm_i_set_conversion_strategy_x (z, SCM_FAILED_CONVERSION_ERROR);
   return z;
 }
 
@@ -349,9 +351,9 @@ scm_mkstrport (SCM pos, SCM str, long modes, const char 
*caller)
      internal encoding of characters in strings is in unicode
      codepoints. */
 
-  /* Ports are initialized with the thread-default values for encoding and
-     invalid sequence handling.  */
-  buf = scm_to_locale_stringn (str, &str_len);
+  /* String ports are are always initialized with "UTF-8" as their
+     encoding.  */
+  buf = scm_to_stringn (str, &str_len, "UTF-8", SCM_FAILED_CONVERSION_ERROR);
   z = scm_i_mkstrport (pos, buf, str_len, modes, caller);
   free (buf);
   return z;
@@ -384,24 +386,6 @@ SCM scm_strport_to_string (SCM port)
   return str;
 }
 
-/* Create a vector containing the locale representation of the string in the
-   port's buffer.  */
-SCM scm_strport_to_locale_u8vector (SCM port)
-{
-  scm_t_port *pt = SCM_PTAB_ENTRY (port);
-  SCM vec;
-  char *buf;
-  
-  if (pt->rw_active == SCM_PORT_WRITE)
-    st_flush (port);
-
-  buf = scm_malloc (pt->read_buf_size);
-  memcpy (buf, pt->read_buf, pt->read_buf_size);
-  vec = scm_take_u8vector ((unsigned char *) buf, pt->read_buf_size);
-  scm_remember_upto_here_1 (port);
-  return vec;
-}
-
 SCM_DEFINE (scm_object_to_string, "object->string", 1, 1, 0,
            (SCM obj, SCM printer),
            "Return a Scheme string obtained by printing @var{obj}.\n"
@@ -426,25 +410,6 @@ SCM_DEFINE (scm_object_to_string, "object->string", 1, 1, 
0,
 }
 #undef FUNC_NAME
 
-SCM_DEFINE (scm_call_with_output_locale_u8vector, 
"call-with-output-locale-u8vector", 1, 0, 0, 
-           (SCM proc),
-           "Calls the one-argument procedure @var{proc} with a newly created 
output\n"
-           "port.  When the function returns, a vector containing the bytes of 
a\n"
-           "locale representation of the characters written into the port is 
returned\n")
-#define FUNC_NAME s_scm_call_with_output_locale_u8vector
-{
-  SCM p;
-
-  p = scm_mkstrport (SCM_INUM0, 
-                    scm_make_string (SCM_INUM0, SCM_UNDEFINED),
-                    SCM_OPN | SCM_WRTNG,
-                     FUNC_NAME);
-  scm_call_1 (proc, p);
-
-  return scm_get_output_locale_u8vector (p);
-}
-#undef FUNC_NAME
-
 SCM_DEFINE (scm_call_with_output_string, "call-with-output-string", 1, 0, 0, 
            (SCM proc),
            "Calls the one-argument procedure @var{proc} with a newly created 
output\n"
@@ -489,27 +454,6 @@ SCM_DEFINE (scm_open_input_string, "open-input-string", 1, 
0, 0,
 }
 #undef FUNC_NAME
 
-SCM_DEFINE (scm_open_input_locale_u8vector, "open-input-locale-u8vector", 1, 
0, 0,
-           (SCM vec),
-           "Take a u8vector containing the bytes of a string encoded in the\n"
-           "current locale and return an input port that delivers characters\n"
-           "from the string. The port can be closed by\n"
-           "@code{close-input-port}, though its storage will be reclaimed\n"
-           "by the garbage collector if it becomes inaccessible.")
-#define FUNC_NAME s_scm_open_input_locale_u8vector
-{
-  scm_t_array_handle hnd;
-  ssize_t inc;
-  size_t len;
-  const scm_t_uint8 *buf;
-
-  buf = scm_u8vector_elements (vec, &hnd, &len, &inc);
-  SCM p = scm_i_mkstrport(SCM_INUM0, (const char *) buf, len, SCM_OPN | 
SCM_RDNG, FUNC_NAME);
-  scm_array_handle_release (&hnd);
-  return p;
-}
-#undef FUNC_NAME
-
 SCM_DEFINE (scm_open_output_string, "open-output-string", 0, 0, 0, 
            (void),
            "Return an output port that will accumulate characters for\n"
@@ -542,19 +486,6 @@ SCM_DEFINE (scm_get_output_string, "get-output-string", 1, 
0, 0,
 #undef FUNC_NAME
 
 
-SCM_DEFINE (scm_get_output_locale_u8vector, "get-output-locale-u8vector", 1, 
0, 0, 
-           (SCM port),
-           "Given an output port created by @code{open-output-string},\n"
-           "return a u8 vector containing the characters of the string\n"
-           "encoded in the current locale.")
-#define FUNC_NAME s_scm_get_output_locale_u8vector
-{
-  SCM_VALIDATE_OPOUTSTRPORT (1, port);
-  return scm_strport_to_locale_u8vector (port);
-}
-#undef FUNC_NAME
-
-
 /* Given a null-terminated string EXPR containing a Scheme expression
    read it, and return it as an SCM value. */
 SCM
diff --git a/libguile/strports.h b/libguile/strports.h
index b2ded01..d93266a 100644
--- a/libguile/strports.h
+++ b/libguile/strports.h
@@ -47,16 +47,12 @@ SCM_API SCM scm_mkstrport (SCM pos, SCM str, long modes, 
const char * caller);
 SCM_INTERNAL SCM scm_i_mkstrport (SCM pos, const char *locale_str, size_t 
str_len, 
                                  long modes, const char *caller);
 SCM_API SCM scm_strport_to_string (SCM port);
-SCM_API SCM scm_strport_to_locale_u8vector (SCM port);
 SCM_API SCM scm_object_to_string (SCM obj, SCM printer);
 SCM_API SCM scm_call_with_output_string (SCM proc);
-SCM_API SCM scm_call_with_output_locale_u8vector (SCM proc);
 SCM_API SCM scm_call_with_input_string (SCM str, SCM proc);
 SCM_API SCM scm_open_input_string (SCM str);
-SCM_API SCM scm_open_input_locale_u8vector (SCM str);
 SCM_API SCM scm_open_output_string (void);
 SCM_API SCM scm_get_output_string (SCM port);
-SCM_API SCM scm_get_output_locale_u8vector (SCM port);
 SCM_API SCM scm_c_read_string (const char *expr);
 SCM_API SCM scm_c_eval_string (const char *expr);
 SCM_API SCM scm_c_eval_string_in_module (const char *expr, SCM module);
diff --git a/test-suite/tests/encoding-iso88591.test 
b/test-suite/tests/encoding-iso88591.test
index 8e85436..b4d48a6 100644
--- a/test-suite/tests/encoding-iso88591.test
+++ b/test-suite/tests/encoding-iso88591.test
@@ -145,27 +145,6 @@
           (list= eqv? (string->list s4)
                  (list #\¿ #\C #\ó #\m #\o #\?))))
 
-;; Check that the output is in ISO-8859-1 encoding
-(with-test-prefix "display"
- 
-  (pass-if "s1"
-          (let ((pt (open-output-string)))
-            (set-port-encoding! pt "ISO-8859-1")
-            (display s1 pt)
-            (list= eqv? 
-                   (list #xfa #x6c #x74 #x69 #x6d #x61)
-                   (u8vector->list
-                    (get-output-locale-u8vector pt)))))
-
-  (pass-if "s2"
-          (let ((pt (open-output-string)))
-            (set-port-encoding! pt "ISO-8859-1")
-            (display s2 pt)
-            (list= eqv? 
-                   (list #x63 #xe9 #x64 #x75 #x6c #x61)
-                   (u8vector->list
-                    (get-output-locale-u8vector pt))))))
-
 (with-test-prefix "symbols == strings"
 
   (pass-if "última"
diff --git a/test-suite/tests/encoding-iso88597.test 
b/test-suite/tests/encoding-iso88597.test
index 9f278f1..8c155d2 100644
--- a/test-suite/tests/encoding-iso88597.test
+++ b/test-suite/tests/encoding-iso88597.test
@@ -142,27 +142,6 @@
           (list= eqv? (string->list s4)
                  (list #\ê #\á #\é))))
 
-;; Testing that the display of the string is output in the ISO-8859-7
-;; encoding
-(with-test-prefix "display"
- 
-  (pass-if "s1"
-          (let ((pt (open-output-string)))
-            (set-port-encoding! pt "ISO-8859-7")
-            (display s1 pt)
-            (list= eqv? 
-                   (list #xd0 #xe5 #xf1 #xdf)
-                   (u8vector->list 
-                    (get-output-locale-u8vector pt)))))
-  (pass-if "s2"
-          (let ((pt (open-output-string)))
-            (set-port-encoding! pt "ISO-8859-7")
-            (display s2 pt)
-            (list= eqv? 
-                   (list #xf4 #xe7 #xf2)
-                   (u8vector->list 
-                    (get-output-locale-u8vector pt))))))
-
 (with-test-prefix "symbols == strings"
 
   (pass-if "Ðåñß"


hooks/post-receive
-- 
GNU Guile




reply via email to

[Prev in Thread] Current Thread [Next in Thread]