[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Guile-commits] GNU Guile branch, master, updated. release_1-9-2-158-g87

From: Michael Gran
Subject: [Guile-commits] GNU Guile branch, master, updated. release_1-9-2-158-g8748ffe
Date: Sat, 05 Sep 2009 17:44:27 +0000

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU Guile".

The branch, master has been updated
       via  8748ffeaa770ed47192f970ef5302a7c7aa7a935 (commit)
      from  28cc8dac2f520fa9de29e93dca52e4892b945a3c (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 8748ffeaa770ed47192f970ef5302a7c7aa7a935
Author: Michael Gran <address@hidden>
Date:   Sat Sep 5 10:42:15 2009 -0700

    Doc updates for character encoding of source code files
    * NEWS
    * doc/ref/scheme-scripts.texi: doc updates for character encoding of
      source code
    * doc/ref/api-evaluation.texi: doc updates for character encoding of
      source code


Summary of changes:
 NEWS                        |   12 +++++++
 doc/ref/api-evaluation.texi |   70 +++++++++++++++++++++++++++++++++++++++++++
 doc/ref/scheme-scripts.texi |    6 ++++
 3 files changed, 88 insertions(+), 0 deletions(-)

diff --git a/NEWS b/NEWS
index a3c4ddd..147d082 100644
--- a/NEWS
+++ b/NEWS
@@ -10,6 +10,18 @@ prerelease, and a full NEWS corresponding to 1.8 -> 2.0.)
 Changes in 1.9.3 (since the 1.9.2 prerelease):
+** Non-ASCII source code files can be read, but require coding
+   declarations
+The default reader now handles source code files for some of the
+non-ASCII character encodings, such as UTF-8.  A non-ASCII source file
+should have an encoding declaration near the top of the file.  Also,
+there is a new function file-encoding that scans a port for a coding
+The pre-1.9.3 reader handled 8-bit clean but otherwise unspecified source
+code.  This use is now discouraged.
 ** Ports do transcoding
 Ports now have an associated character encoding, and port read/write
diff --git a/doc/ref/api-evaluation.texi b/doc/ref/api-evaluation.texi
index d841215..9fc5ef5 100644
--- a/doc/ref/api-evaluation.texi
+++ b/doc/ref/api-evaluation.texi
@@ -17,6 +17,7 @@ loading, evaluating, and compiling Scheme code at run time.
 * Fly Evaluation::              Procedures for on the fly evaluation.
 * Compilation::                 How to compile Scheme files and procedures.
 * Loading::                     Loading Scheme code from file.
+* Character Encoding of Source Files:: Loading non-ASCII Scheme code from file.
 * Delayed Evaluation::          Postponing evaluation until it is needed.
 * Local Evaluation::            Evaluation in a local environment.
 * Evaluator Behaviour::         Modifying Guile's evaluator.
@@ -229,6 +230,12 @@ Thus a Guile script often starts like this.
 More details on Guile scripting can be found in the scripting section
 (@pxref{Guile Scripting}).
+There is one special case where the contents of a comment can actually
+affect the interpretation of code.  When a character encoding
+declaration, such as @code{coding: utf-8} appears in one of the first
+few lines of a source file, it indicates to Guile's default reader
+that this source code file is not ASCII.  For details see @ref{Character
+Encoding of Source Files}.
 @node Case Sensitivity
 @subsubsection Case Sensitivity
@@ -590,6 +597,69 @@ a file to load.  By default, @code{%load-extensions} is 
bound to the
 list @code{("" ".scm")}.
 @end defvar
address@hidden Character Encoding of Source Files
address@hidden Character Encoding of Source Files
address@hidden primitive-load
address@hidden load
+Scheme source code files are usually encoded in ASCII, but, the
+built-in reader can interpret other character encodings.  The
+procedure @code{primitive-load}, and by extension the functions that
+call it, such as @code{load}, first scan the top 500 characters of the
+file for a coding declaration.
+A coding declaration has the form @code{coding: XXXXXX}, where
address@hidden is the name of a character encoding in which the source
+code file has been encoded.  The coding declaration must appear in a
+scheme comment.  It can either be a semicolon-initiated comment or a block
address@hidden comment.
+The name of the character encoding in the coding declaration is
+typically lower case and containing only letters, numbers, and
+hyphens.  The most common examples of character encodings are
address@hidden and @code{iso-8859-1}.  This allows the coding
+declaration to be compatible with EMACS.
+For source code, only a subset of all possible character encodings can
+be interpreted by the built-in source code reader.  Only those
+character encodings in which ASCII text appears unmodified can be
+used.  This includes @code{UTF-8} and @code{ISO-8859-1} through
address@hidden  The multi-byte character encodings @code{UTF-16}
+and @code{UTF-32} may not be used because they are not compatible with
address@hidden read
address@hidden set-port-encoding!
+There might be a scenario in which one would want to read non-ASCII
+code from a port, such as with the function @code{read}, instead of
+with @code{load}.  If the port's character encoding is the same as the
+encoding of the code to be read by the port, not other special
+handling is necessary.  The port will automatically do the character
+encoding conversion.  The functions @code{setlocale} or by
address@hidden are used to set port encodings.
+If a port is used to read code of unknown character encoding, it can
+accomplish this in three steps.  First, the character encoding of the
+port should be set to ISO-8859-1 using @code{set-port-encoding!}.
+Then, the procedure @code{file-encoding}, described below, is used to
+scan for a coding declaration when reading from the port.  As a side
+effect, it rewinds the port after its scan is complete. After that,
+the port's character encoding should be set to the encoding returned
+by @code{file-encoding}, if any, again by using
address@hidden  Then the code can be read as normal.
address@hidden {Scheme Procedure} file-encoding port
address@hidden {C Function} scm_file_encoding port
+Scans the port for an EMACS-like character coding declaration near the
+top of the contents of a port with random-acessible contents.  The
+coding declaration is of the form @code{coding: XXXXX} and must appear
+in a scheme comment.
+Returns a string containing the character encoding of the file
+if a declaration was found, or @code{#f} otherwise.  The port is
address@hidden deffn
 @node Delayed Evaluation
 @subsection Delayed Evaluation
diff --git a/doc/ref/scheme-scripts.texi b/doc/ref/scheme-scripts.texi
index e12eee6..249bc34 100644
--- a/doc/ref/scheme-scripts.texi
+++ b/doc/ref/scheme-scripts.texi
@@ -64,6 +64,12 @@ operating system never reads this far, but Guile treats this 
as the end
 of the comment begun on the first line by the @samp{#!} characters.
+If this source code file is not ASCII or ISO-8859-1 encoded, a coding
+declaration such as @code{coding: utf-8} should appear in a comment
+somewhere in the first five lines of the file: see @ref{Character
+Encoding of Source Files}.
 The rest of the file should be a Scheme program.
 @end itemize

GNU Guile

reply via email to

[Prev in Thread] Current Thread [Next in Thread]