guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCHES] Keyword args for file openers; coding scan off by default


From: Mark H Weaver
Subject: Re: [PATCHES] Keyword args for file openers; coding scan off by default
Date: Sun, 07 Apr 2013 12:33:17 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

Hi Ludovic,

Thanks for the quick reviews.  I pushed the first two patches, after
incorporating your suggestions.

address@hidden (Ludovic Courtès) writes:

> Looks good.
>
> Minor comments:
>
>> address@hidden {Scheme Procedure} open-file filename mode @
>> +                          [#:guess-encoding=#f] [#:encoding=#f]
>> address@hidden {C Function} scm_open_file_with_encoding @
>> +                     (filename, mode, guess_encoding, encoding)
>>  @deffnx {C Function} scm_open_file (filename, mode)
>>  Open the file whose name is @var{filename}, and return a port
>>  representing that file.  The attributes of the port are
>> @@ -900,8 +903,17 @@ to the underlying @code{open} call.  Still, the flag is 
>> generally useful
>>  because of its port encoding ramifications.
>>  @end table
>>  
>> -If a file cannot be opened with the access
>> -requested, @code{open-file} throws an exception.
>> +Unless binary mode is requested, the character encoding of the new port
>> +is determined as follows: First, if @var{guess-encoding} is true,
>> +heuristics will be used to guess the encoding of the file.  If it is
>
> “heuristics” is vague.  I’d prefer “the ‘file-encoding’ procedure is
> called to check for Emacs-style coding declarations (@pxref{Character
> Encoding of Source Files})”.  Should BOMs also be mentioned?

Makes sense.  Following our discussion on IRC, the patch below has the
following wording:

  Unless binary mode is requested, the character encoding of the new
  port is determined as follows: First, if @var{guess-encoding} is true,
  the @code{file-encoding} procedure is used to guess the encoding of
  the file (@pxref{Character Encoding of Source Files}).  [...]

I left out the "Emacs-style coding declarations" language here, because
I want to leave open the possibility of adding additional heuristics to
'file-encoding' in the future.

Furthermore, I've attached an additional patch below that changes the
"Character Encoding of Source Files" node to make that more clear.
Please let me know what you think.

>> +SCM k_guess_encoding = SCM_UNDEFINED;
>> +SCM k_encoding       = SCM_UNDEFINED;
>
> Add ‘static’.

Gah, thanks for catching that.  Oops!

>> +  scm_c_bind_keyword_arguments (FUNC_NAME, keyword_args, 0,
>> +                                k_guess_encoding, &guess_encoding,
>> +                                k_encoding, &encoding,
>> +                                SCM_UNDEFINED);
>
> Comes in handy.  ;-)

This patch was the motivation for adding that :)

>> +(define* (open-input-file
>> +          str #:key (binary #f) (encoding #f) (guess-encoding #f))
>> +  "Takes a string naming an existing file and returns an input port
>> +capable of delivering characters from the file.  If the file
>> +cannot be opened, an error is signalled."
>
> It’s a detail, for these procedures, I would s/str/file/, and in
> docstrings s/file STR/FILE/.

Good idea.

An updated patch, and the aforementioned new patch for the
'file-encoding' docs, follow.  More thoughts?

    Thanks!
      Mark


>From d7a915ba0fec4cc7d007a5582ed31b0135ff8123 Mon Sep 17 00:00:00 2001
From: Mark H Weaver <address@hidden>
Date: Sat, 6 Apr 2013 23:19:55 -0400
Subject: [PATCH 1/2] Add keyword arguments to file opening procedures.

* libguile/fports.c (scm_open_file_with_encoding): New API function,
  containing the code previously found in 'scm_open_file', but modified
  to accept the new 'guess_encoding' and 'encoding' arguments.

  (scm_open_file): Now just a simple wrapper that calls
  'scm_open_file_with_encoding'.

  (scm_i_open_file): New implementation of 'open-file' that accepts
  keyword arguments '#:guess-encoding' and '#:encoding', and calls
  'scm_open_file_with_encoding'.

  (scm_init_fports_keywords): New initialization function that gets
  called after keywords are initialized.

* libguile/fports.h (scm_open_file_with_encoding,
  scm_init_fports_keywords): Add prototypes.

* libguile/init.c (scm_i_init_guile): Call 'scm_init_fports_keywords'.

* module/ice-9/boot-9.scm: Add enhanced versions of 'open-input-file',
  'open-output-file', 'call-with-input-file', 'call-with-output-file',
  'with-input-from-file', 'with-output-to-file', and
  'with-error-to-file', that accept keyword arguments '#:binary',
  '#:encoding', and (for input port constructors) '#:guess-encoding'.

* doc/ref/api-io.texi (File Ports): Update documentation.

* test-suite/tests/ports.test ("keyword arguments for file openers"):
  Add tests.
---
 doc/ref/api-io.texi         |   61 ++++++---
 libguile/fports.c           |  191 +++++++++++++++++++++--------
 libguile/fports.h           |    3 +
 libguile/init.c             |    1 +
 module/ice-9/boot-9.scm     |  110 +++++++++++++++++
 test-suite/tests/ports.test |  285 ++++++++++++++++++++++++++++++++++++++++++-
 6 files changed, 581 insertions(+), 70 deletions(-)

diff --git a/doc/ref/api-io.texi b/doc/ref/api-io.texi
index 9483166..4c42de8 100644
--- a/doc/ref/api-io.texi
+++ b/doc/ref/api-io.texi
@@ -843,7 +843,10 @@ Most systems have limits on how many files can be open, so 
it's
 strongly recommended that file ports be closed explicitly when no
 longer required (@pxref{Ports}).
 
address@hidden {Scheme Procedure} open-file filename mode
address@hidden {Scheme Procedure} open-file filename mode @
+                          [#:guess-encoding=#f] [#:encoding=#f]
address@hidden {C Function} scm_open_file_with_encoding @
+                     (filename, mode, guess_encoding, encoding)
 @deffnx {C Function} scm_open_file (filename, mode)
 Open the file whose name is @var{filename}, and return a port
 representing that file.  The attributes of the port are
@@ -900,8 +903,18 @@ to the underlying @code{open} call.  Still, the flag is 
generally useful
 because of its port encoding ramifications.
 @end table
 
-If a file cannot be opened with the access
-requested, @code{open-file} throws an exception.
+Unless binary mode is requested, the character encoding of the new port
+is determined as follows: First, if @var{guess-encoding} is true, the
address@hidden procedure is used to guess the encoding of the file
+(@pxref{Character Encoding of Source Files}).  If @var{guess-encoding}
+is false or if @code{file-encoding} fails, @var{encoding} is used unless
+it is also false.  As a last resort, the default port encoding is used.
address@hidden, for more information on port encodings.  It is an error to
+pass a non-false @var{guess-encoding} or @var{encoding} if binary mode
+is requested.
+
+If a file cannot be opened with the access requested, @code{open-file}
+throws an exception.
 
 When the file is opened, its encoding is set to the current
 @code{%default-port-encoding}, unless the @code{b} flag was supplied.
@@ -924,23 +937,40 @@ current interfaces.
 @end deffn
 
 @rnindex open-input-file
address@hidden {Scheme Procedure} open-input-file filename
-Open @var{filename} for input.  Equivalent to
address@hidden {Scheme Procedure} open-input-file filename @
+       [#:guess-encoding=#f] [#:encoding=#f] [#:binary=#f]
+
+Open @var{filename} for input.  If @var{binary} is true, open the port
+in binary mode, otherwise use text mode.  @var{encoding} and
address@hidden determine the character encoding as described above
+for @code{open-file}.  Equivalent to
 @lisp
-(open-file @var{filename} "r")
+(open-file @var{filename}
+           (if @var{binary} "rb" "r")
+           #:guess-encoding @var{guess-encoding}
+           #:encoding @var{encoding})
 @end lisp
 @end deffn
 
 @rnindex open-output-file
address@hidden {Scheme Procedure} open-output-file filename
-Open @var{filename} for output.  Equivalent to
address@hidden {Scheme Procedure} open-output-file filename @
+       [#:encoding=#f] [#:binary=#f]
+
+Open @var{filename} for output.  If @var{binary} is true, open the port
+in binary mode, otherwise use text mode.  @var{encoding} specifies the
+character encoding as described above for @code{open-file}.  Equivalent
+to
 @lisp
-(open-file @var{filename} "w")
+(open-file @var{filename}
+           (if @var{binary} "wb" "w")
+           #:encoding @var{encoding})
 @end lisp
 @end deffn
 
address@hidden {Scheme Procedure} call-with-input-file filename proc
address@hidden {Scheme Procedure} call-with-output-file filename proc
address@hidden {Scheme Procedure} call-with-input-file filename proc @
+        [#:guess-encoding=#f] [#:encoding=#f] [#:binary=#f]
address@hidden {Scheme Procedure} call-with-output-file filename proc @
+        [#:encoding=#f] [#:binary=#f]
 @rnindex call-with-input-file
 @rnindex call-with-output-file
 Open @var{filename} for input or output, and call @code{(@var{proc}
@@ -955,9 +985,12 @@ closed automatically, though it will be garbage collected 
in the usual
 way if not otherwise referenced.
 @end deffn
 
address@hidden {Scheme Procedure} with-input-from-file filename thunk
address@hidden {Scheme Procedure} with-output-to-file filename thunk
address@hidden {Scheme Procedure} with-error-to-file filename thunk
address@hidden {Scheme Procedure} with-input-from-file filename thunk @
+        [#:guess-encoding=#f] [#:encoding=#f] [#:binary=#f]
address@hidden {Scheme Procedure} with-output-to-file filename thunk @
+        [#:encoding=#f] [#:binary=#f]
address@hidden {Scheme Procedure} with-error-to-file filename thunk @
+        [#:encoding=#f] [#:binary=#f]
 @rnindex with-input-from-file
 @rnindex with-output-to-file
 Open @var{filename} and call @code{(@var{thunk})} with the new port
diff --git a/libguile/fports.c b/libguile/fports.c
index b9a9942..70732e5 100644
--- a/libguile/fports.c
+++ b/libguile/fports.c
@@ -315,65 +315,35 @@ fport_canonicalize_filename (SCM filename)
     }
 }
 
+/* scm_open_file_with_encoding
+   Return a new port open on a given file.
 
-/* scm_open_file
- * Return a new port open on a given file.
- *
- * The mode string must match the pattern: [rwa+]** which
- * is interpreted in the usual unix way.
- *
- * Return the new port.
- */
-SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
-           (SCM filename, SCM mode),
-           "Open the file whose name is @var{filename}, and return a port\n"
-           "representing that file.  The attributes of the port are\n"
-           "determined by the @var{mode} string.  The way in which this is\n"
-           "interpreted is similar to C stdio.  The first character must be\n"
-           "one of the following:\n"
-           "@table @samp\n"
-           "@item r\n"
-           "Open an existing file for input.\n"
-           "@item w\n"
-           "Open a file for output, creating it if it doesn't already exist\n"
-           "or removing its contents if it does.\n"
-           "@item a\n"
-           "Open a file for output, creating it if it doesn't already\n"
-           "exist.  All writes to the port will go to the end of the file.\n"
-           "The \"append mode\" can be turned off while the port is in use\n"
-           "@pxref{Ports and File Descriptors, fcntl}\n"
-           "@end table\n"
-           "The following additional characters can be appended:\n"
-           "@table @samp\n"
-           "@item b\n"
-           "Open the underlying file in binary mode, if supported by the 
system.\n"
-           "Also, open the file using the binary-compatible character 
encoding\n"
-           "\"ISO-8859-1\", ignoring the default port encoding.\n"
-           "@item +\n"
-           "Open the port for both input and output.  E.g., @code{r+}: open\n"
-           "an existing file for both input and output.\n"
-           "@item 0\n"
-           "Create an \"unbuffered\" port.  In this case input and output\n"
-           "operations are passed directly to the underlying port\n"
-           "implementation without additional buffering.  This is likely to\n"
-           "slow down I/O operations.  The buffering mode can be changed\n"
-           "while a port is in use @pxref{Ports and File Descriptors,\n"
-           "setvbuf}\n"
-           "@item l\n"
-           "Add line-buffering to the port.  The port output buffer will be\n"
-           "automatically flushed whenever a newline character is written.\n"
-           "@end table\n"
-           "In theory we could create read/write ports which were buffered\n"
-           "in one direction only.  However this isn't included in the\n"
-           "current interfaces.  If a file cannot be opened with the access\n"
-           "requested, @code{open-file} throws an exception.")
-#define FUNC_NAME s_scm_open_file
+   The mode string must match the pattern: [rwa+]** which
+   is interpreted in the usual unix way.
+
+   Unless binary mode is requested, the character encoding of the new
+   port is determined as follows: First, if GUESS_ENCODING is true,
+   'file-encoding' is used to guess the encoding of the file.  If
+   GUESS_ENCODING is false or if 'file-encoding' fails, ENCODING is used
+   unless it is also false.  As a last resort, the default port encoding
+   is used.  It is an error to pass a non-false GUESS_ENCODING or
+   ENCODING if binary mode is requested.
+
+   Return the new port. */
+SCM
+scm_open_file_with_encoding (SCM filename, SCM mode,
+                             SCM guess_encoding, SCM encoding)
+#define FUNC_NAME "open-file"
 {
   SCM port;
   int fdes, flags = 0, binary = 0;
   unsigned int retries;
   char *file, *md, *ptr;
 
+  if (SCM_UNLIKELY (!(scm_is_false (encoding) || scm_is_string (encoding))))
+    scm_wrong_type_arg_msg (FUNC_NAME, 0, encoding,
+                            "encoding to be string or false");
+
   scm_dynwind_begin (0);
 
   file = scm_to_locale_string (filename);
@@ -445,8 +415,43 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
                              fport_canonicalize_filename (filename));
 
   if (binary)
-    /* Use the binary-friendly ISO-8859-1 encoding. */
-    scm_i_set_port_encoding_x (port, NULL);
+    {
+      if (scm_is_true (encoding))
+        scm_misc_error (FUNC_NAME,
+                        "Encoding specified on a binary port",
+                        scm_list_1 (encoding));
+      if (scm_is_true (guess_encoding))
+        scm_misc_error (FUNC_NAME,
+                        "Request to guess encoding on a binary port",
+                        SCM_EOL);
+
+      /* Use the binary-friendly ISO-8859-1 encoding. */
+      scm_i_set_port_encoding_x (port, NULL);
+    }
+  else
+    {
+      char *enc = NULL;
+
+      if (scm_is_true (guess_encoding))
+        {
+          if (SCM_INPUT_PORT_P (port))
+            enc = scm_i_scan_for_encoding (port);
+          else
+            scm_misc_error (FUNC_NAME,
+                            "Request to guess encoding on an output-only port",
+                            SCM_EOL);
+        }
+
+      if (!enc && scm_is_true (encoding))
+        {
+          char *buf = scm_to_latin1_string (encoding);
+          enc = scm_gc_strdup (buf, "encoding");
+          free (buf);
+        }
+
+      if (enc)
+        scm_i_set_port_encoding_x (port, enc);
+    }
 
   scm_dynwind_end ();
 
@@ -454,6 +459,75 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
 }
 #undef FUNC_NAME
 
+SCM
+scm_open_file (SCM filename, SCM mode)
+{
+  return scm_open_file_with_encoding (filename, mode, SCM_BOOL_F, SCM_BOOL_F);
+}
+
+/* We can't define these using SCM_KEYWORD, because keywords have not
+   yet been initialized when scm_init_fports is called.  */
+static SCM k_guess_encoding = SCM_UNDEFINED;
+static SCM k_encoding       = SCM_UNDEFINED;
+
+SCM_DEFINE (scm_i_open_file, "open-file", 2, 0, 1,
+           (SCM filename, SCM mode, SCM keyword_args),
+           "Open the file whose name is @var{filename}, and return a port\n"
+           "representing that file.  The attributes of the port are\n"
+           "determined by the @var{mode} string.  The way in which this is\n"
+           "interpreted is similar to C stdio.  The first character must be\n"
+           "one of the following:\n"
+           "@table @samp\n"
+           "@item r\n"
+           "Open an existing file for input.\n"
+           "@item w\n"
+           "Open a file for output, creating it if it doesn't already exist\n"
+           "or removing its contents if it does.\n"
+           "@item a\n"
+           "Open a file for output, creating it if it doesn't already\n"
+           "exist.  All writes to the port will go to the end of the file.\n"
+           "The \"append mode\" can be turned off while the port is in use\n"
+           "@pxref{Ports and File Descriptors, fcntl}\n"
+           "@end table\n"
+           "The following additional characters can be appended:\n"
+           "@table @samp\n"
+           "@item b\n"
+           "Open the underlying file in binary mode, if supported by the 
system.\n"
+           "Also, open the file using the binary-compatible character 
encoding\n"
+           "\"ISO-8859-1\", ignoring the default port encoding.\n"
+           "@item +\n"
+           "Open the port for both input and output.  E.g., @code{r+}: open\n"
+           "an existing file for both input and output.\n"
+           "@item 0\n"
+           "Create an \"unbuffered\" port.  In this case input and output\n"
+           "operations are passed directly to the underlying port\n"
+           "implementation without additional buffering.  This is likely to\n"
+           "slow down I/O operations.  The buffering mode can be changed\n"
+           "while a port is in use @pxref{Ports and File Descriptors,\n"
+           "setvbuf}\n"
+           "@item l\n"
+           "Add line-buffering to the port.  The port output buffer will be\n"
+           "automatically flushed whenever a newline character is written.\n"
+           "@end table\n"
+           "In theory we could create read/write ports which were buffered\n"
+           "in one direction only.  However this isn't included in the\n"
+           "current interfaces.  If a file cannot be opened with the access\n"
+           "requested, @code{open-file} throws an exception.")
+#define FUNC_NAME s_scm_i_open_file
+{
+  SCM encoding = SCM_BOOL_F;
+  SCM guess_encoding = SCM_BOOL_F;
+
+  scm_c_bind_keyword_arguments (FUNC_NAME, keyword_args, 0,
+                                k_guess_encoding, &guess_encoding,
+                                k_encoding, &encoding,
+                                SCM_UNDEFINED);
+
+  return scm_open_file_with_encoding (filename, mode,
+                                      guess_encoding, encoding);
+}
+#undef FUNC_NAME
+

 /* Building Guile ports from a file descriptor.  */
 
@@ -804,6 +878,15 @@ scm_make_fptob ()
   return tc;
 }
 
+/* We can't initialize the keywords from 'scm_init_fports', because
+   keywords haven't yet been initialized at that point.  */
+void
+scm_init_fports_keywords ()
+{
+  k_guess_encoding = scm_from_latin1_keyword ("guess-encoding");
+  k_encoding       = scm_from_latin1_keyword ("encoding");
+}
+
 void
 scm_init_fports ()
 {
diff --git a/libguile/fports.h b/libguile/fports.h
index cbef0f8..c32ed95 100644
--- a/libguile/fports.h
+++ b/libguile/fports.h
@@ -51,9 +51,12 @@ SCM_API scm_t_bits scm_tc16_fport;
 SCM_API SCM scm_setbuf0 (SCM port);
 SCM_API SCM scm_setvbuf (SCM port, SCM mode, SCM size);
 SCM_API void scm_evict_ports (int fd);
+SCM_API SCM scm_open_file_with_encoding (SCM filename, SCM modes,
+                                         SCM guess_encoding, SCM encoding);
 SCM_API SCM scm_open_file (SCM filename, SCM modes);
 SCM_API SCM scm_fdes_to_port (int fdes, char *mode, SCM name);
 SCM_API SCM scm_file_port_p (SCM obj);
+SCM_INTERNAL void scm_init_fports_keywords (void);
 SCM_INTERNAL void scm_init_fports (void);
 
 /* internal functions */
diff --git a/libguile/init.c b/libguile/init.c
index 57e4902..455a772 100644
--- a/libguile/init.c
+++ b/libguile/init.c
@@ -444,6 +444,7 @@ scm_i_init_guile (void *base)
   scm_init_gettext ();
   scm_init_ioext ();
   scm_init_keywords ();    /* Requires smob_prehistory */
+  scm_init_fports_keywords ();
   scm_init_list ();
   scm_init_random ();      /* Requires smob_prehistory */
   scm_init_macros ();      /* Requires smob_prehistory and random */
diff --git a/module/ice-9/boot-9.scm b/module/ice-9/boot-9.scm
index 8461ee8..0779d27 100644
--- a/module/ice-9/boot-9.scm
+++ b/module/ice-9/boot-9.scm
@@ -753,6 +753,116 @@ information is unavailable."

 
 ;;;
+;;; Enhanced file opening procedures
+;;;
+
+(define* (open-input-file
+          file #:key (binary #f) (encoding #f) (guess-encoding #f))
+  "Takes a string naming an existing file and returns an input port
+capable of delivering characters from the file.  If the file
+cannot be opened, an error is signalled."
+  (open-file file (if binary "rb" "r")
+             #:encoding encoding
+             #:guess-encoding guess-encoding))
+
+(define* (open-output-file file #:key (binary #f) (encoding #f))
+  "Takes a string naming an output file to be created and returns an
+output port capable of writing characters to a new file by that
+name.  If the file cannot be opened, an error is signalled.  If a
+file with the given name already exists, the effect is unspecified."
+  (open-file file (if binary "wb" "w")
+             #:encoding encoding))
+
+(define* (call-with-input-file
+          file proc #:key (binary #f) (encoding #f) (guess-encoding #f))
+  "PROC should be a procedure of one argument, and FILE should be a
+string naming a file.  The file must
+already exist. These procedures call PROC
+with one argument: the port obtained by opening the named file for
+input or output.  If the file cannot be opened, an error is
+signalled.  If the procedure returns, then the port is closed
+automatically and the values yielded by the procedure are returned.
+If the procedure does not return, then the port will not be closed
+automatically unless it is possible to prove that the port will
+never again be used for a read or write operation."
+  (let ((p (open-input-file file
+                            #:binary binary
+                            #:encoding encoding
+                            #:guess-encoding guess-encoding)))
+    (call-with-values
+      (lambda () (proc p))
+      (lambda vals
+        (close-input-port p)
+        (apply values vals)))))
+
+(define* (call-with-output-file file proc #:key (binary #f) (encoding #f))
+  "PROC should be a procedure of one argument, and FILE should be a
+string naming a file.  The behaviour is unspecified if the file
+already exists. These procedures call PROC
+with one argument: the port obtained by opening the named file for
+input or output.  If the file cannot be opened, an error is
+signalled.  If the procedure returns, then the port is closed
+automatically and the values yielded by the procedure are returned.
+If the procedure does not return, then the port will not be closed
+automatically unless it is possible to prove that the port will
+never again be used for a read or write operation."
+  (let ((p (open-output-file file #:binary binary #:encoding encoding)))
+    (call-with-values
+      (lambda () (proc p))
+      (lambda vals
+        (close-output-port p)
+        (apply values vals)))))
+
+(define* (with-input-from-file
+          file thunk #:key (binary #f) (encoding #f) (guess-encoding #f))
+  "THUNK must be a procedure of no arguments, and FILE must be a
+string naming a file.  The file must already exist. The file is opened for
+input, an input port connected to it is made
+the default value returned by `current-input-port',
+and the THUNK is called with no arguments.
+When the THUNK returns, the port is closed and the previous
+default is restored.  Returns the values yielded by THUNK.  If an
+escape procedure is used to escape from the continuation of these
+procedures, their behavior is implementation dependent."
+  (call-with-input-file file
+   (lambda (p) (with-input-from-port p thunk))
+   #:binary binary
+   #:encoding encoding
+   #:guess-encoding guess-encoding))
+
+(define* (with-output-to-file file thunk #:key (binary #f) (encoding #f))
+  "THUNK must be a procedure of no arguments, and FILE must be a
+string naming a file.  The effect is unspecified if the file already exists.
+The file is opened for output, an output port connected to it is made
+the default value returned by `current-output-port',
+and the THUNK is called with no arguments.
+When the THUNK returns, the port is closed and the previous
+default is restored.  Returns the values yielded by THUNK.  If an
+escape procedure is used to escape from the continuation of these
+procedures, their behavior is implementation dependent."
+  (call-with-output-file file
+   (lambda (p) (with-output-to-port p thunk))
+   #:binary binary
+   #:encoding encoding))
+
+(define* (with-error-to-file file thunk #:key (binary #f) (encoding #f))
+  "THUNK must be a procedure of no arguments, and FILE must be a
+string naming a file.  The effect is unspecified if the file already exists.
+The file is opened for output, an output port connected to it is made
+the default value returned by `current-error-port',
+and the THUNK is called with no arguments.
+When the THUNK returns, the port is closed and the previous
+default is restored.  Returns the values yielded by THUNK.  If an
+escape procedure is used to escape from the continuation of these
+procedures, their behavior is implementation dependent."
+  (call-with-output-file file
+   (lambda (p) (with-error-to-port p thunk))
+   #:binary binary
+   #:encoding encoding))
+
+
+
+;;;
 ;;; Extensible exception printing.
 ;;;
 
diff --git a/test-suite/tests/ports.test b/test-suite/tests/ports.test
index fc6d087..8e3df5b 100644
--- a/test-suite/tests/ports.test
+++ b/test-suite/tests/ports.test
@@ -274,8 +274,8 @@
                    (delete-file filename)
                    (string=? line2 binary-test-string)))))
 
-;; open-file ignores file coding declaration
-(pass-if "file: open-file ignores coding declarations"
+;; open-file ignores file coding declaration by default
+(pass-if "file: open-file ignores coding declaration by default"
   (with-fluids ((%default-port-encoding "UTF-8"))
                (let* ((filename (test-file))
                       (port (open-output-file filename))
@@ -290,6 +290,287 @@
                    (delete-file filename)
                    (string=? line2 test-string)))))
 
+;; open-input-file with guess-encoding honors coding declaration
+(pass-if "file: open-input-file with guess-encoding honors coding declaration"
+  (with-fluids ((%default-port-encoding "UTF-8"))
+               (let* ((filename (test-file))
+                      (port (open-output-file filename))
+                      (test-string "€100"))
+                 (set-port-encoding! port "iso-8859-15")
+                 (write-line ";; coding: iso-8859-15" port)
+                 (write-line test-string port)
+                 (close-port port)
+                 (let* ((in-port (open-input-file filename
+                                                  #:guess-encoding #t))
+                        (line1 (read-line in-port))
+                        (line2 (read-line in-port)))
+                   (close-port in-port)
+                   (delete-file filename)
+                   (string=? line2 test-string)))))
+
+(with-test-prefix "keyword arguments for file openers"
+  (with-fluids ((%default-port-encoding "UTF-8"))
+    (let ((filename (test-file)))
+
+      (with-test-prefix "write #:encoding"
+
+        (pass-if-equal "open-file"
+            #vu8(116 0 101 0 115 0 116 0)
+            (let ((port (open-file filename "w"
+                                   #:encoding "UTF-16LE")))
+              (display "test" port)
+              (close-port port))
+            (let* ((port (open-file filename "rb"))
+                   (bv (get-bytevector-all port)))
+              (close-port port)
+              bv))
+
+        (pass-if-equal "open-output-file"
+            #vu8(116 0 101 0 115 0 116 0)
+            (let ((port (open-output-file filename
+                                          #:encoding "UTF-16LE")))
+              (display "test" port)
+              (close-port port))
+            (let* ((port (open-file filename "rb"))
+                   (bv (get-bytevector-all port)))
+              (close-port port)
+              bv))
+
+        (pass-if-equal "call-with-output-file"
+            #vu8(116 0 101 0 115 0 116 0)
+            (call-with-output-file filename
+              (lambda (port)
+                (display "test" port))
+              #:encoding "UTF-16LE")
+            (let* ((port (open-file filename "rb"))
+                   (bv (get-bytevector-all port)))
+              (close-port port)
+              bv))
+
+        (pass-if-equal "with-output-to-file"
+            #vu8(116 0 101 0 115 0 116 0)
+            (with-output-to-file filename
+              (lambda ()
+                (display "test"))
+              #:encoding "UTF-16LE")
+            (let* ((port (open-file filename "rb"))
+                   (bv (get-bytevector-all port)))
+              (close-port port)
+              bv))
+
+        (pass-if-equal "with-error-to-file"
+            #vu8(116 0 101 0 115 0 116 0)
+            (with-error-to-file
+             filename
+             (lambda ()
+               (display "test" (current-error-port)))
+             #:encoding "UTF-16LE")
+            (let* ((port (open-file filename "rb"))
+                   (bv (get-bytevector-all port)))
+              (close-port port)
+              bv)))
+
+      (with-test-prefix "write #:binary"
+
+        (pass-if-equal "open-output-file"
+            "ISO-8859-1"
+          (let* ((port (open-output-file filename #:binary #t))
+                 (enc (port-encoding port)))
+            (close-port port)
+            enc))
+
+        (pass-if-equal "call-with-output-file"
+            "ISO-8859-1"
+          (call-with-output-file filename port-encoding #:binary #t))
+
+        (pass-if-equal "with-output-to-file"
+            "ISO-8859-1"
+          (with-output-to-file filename
+            (lambda () (port-encoding (current-output-port)))
+            #:binary #t))
+
+        (pass-if-equal "with-error-to-file"
+            "ISO-8859-1"
+          (with-error-to-file
+           filename
+           (lambda () (port-encoding (current-error-port)))
+           #:binary #t)))
+
+      (with-test-prefix "read #:encoding"
+
+        (pass-if-equal "open-file read #:encoding"
+            "test"
+          (call-with-output-file filename
+            (lambda (port)
+              (put-bytevector port #vu8(116 0 101 0 115 0 116 0))))
+          (let* ((port (open-file filename "r" #:encoding "UTF-16LE"))
+                 (str  (read-string port)))
+            (close-port port)
+            str))
+
+        (pass-if-equal "open-input-file #:encoding"
+            "test"
+          (call-with-output-file filename
+            (lambda (port)
+              (put-bytevector port #vu8(116 0 101 0 115 0 116 0))))
+          (let* ((port (open-input-file filename #:encoding "UTF-16LE"))
+                 (str  (read-string port)))
+            (close-port port)
+            str))
+
+        (pass-if-equal "call-with-input-file #:encoding"
+            "test"
+          (call-with-output-file filename
+            (lambda (port)
+              (put-bytevector port #vu8(116 0 101 0 115 0 116 0))))
+          (call-with-input-file filename
+            read-string
+            #:encoding "UTF-16LE"))
+
+        (pass-if-equal "with-input-from-file #:encoding"
+            "test"
+          (call-with-output-file filename
+            (lambda (port)
+              (put-bytevector port #vu8(116 0 101 0 115 0 116 0))))
+          (with-input-from-file filename
+            read-string
+            #:encoding "UTF-16LE")))
+
+      (with-test-prefix "read #:binary"
+
+        (pass-if-equal "open-input-file"
+            "ISO-8859-1"
+          (let* ((port (open-input-file filename #:binary #t))
+                 (enc (port-encoding port)))
+            (close-port port)
+            enc))
+
+        (pass-if-equal "call-with-input-file"
+            "ISO-8859-1"
+          (call-with-input-file filename port-encoding #:binary #t))
+
+        (pass-if-equal "with-input-from-file"
+            "ISO-8859-1"
+          (with-input-from-file filename
+            (lambda () (port-encoding (current-input-port)))
+            #:binary #t)))
+
+      (with-test-prefix "#:guess-encoding with coding declaration"
+
+        (pass-if-equal "open-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda ()
+              (write-line "test")
+              (write-line "; coding: ISO-8859-15")
+              (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (let* ((port (open-file filename "r"
+                                  #:guess-encoding #t
+                                  #:encoding "UTF-16LE"))
+                 (str (begin (read-line port)
+                             (read-line port)
+                             (read-line port))))
+            (close-port port)
+            str))
+
+        (pass-if-equal "open-input-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda ()
+              (write-line "test")
+              (write-line "; coding: ISO-8859-15")
+              (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (let* ((port (open-input-file filename
+                                        #:guess-encoding #t
+                                        #:encoding "UTF-16LE"))
+                 (str (begin (read-line port)
+                             (read-line port)
+                             (read-line port))))
+            (close-port port)
+            str))
+
+        (pass-if-equal "call-with-input-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda ()
+              (write-line "test")
+              (write-line "; coding: ISO-8859-15")
+              (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (call-with-input-file filename
+            (lambda (port)
+              (read-line port)
+              (read-line port)
+              (read-line port))
+            #:guess-encoding #t
+            #:encoding "UTF-16LE"))
+
+        (pass-if-equal "with-input-from-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda ()
+              (write-line "test")
+              (write-line "; coding: ISO-8859-15")
+              (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (with-input-from-file filename
+            (lambda ()
+              (read-line)
+              (read-line)
+              (read-line))
+            #:guess-encoding #t
+            #:encoding "UTF-16LE")))
+
+      (with-test-prefix "#:guess-encoding without coding declaration"
+
+        (pass-if-equal "open-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda () (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (let* ((port (open-file filename "r"
+                                  #:guess-encoding #t
+                                  #:encoding "ISO-8859-15"))
+                 (str (read-line port)))
+            (close-port port)
+            str))
+
+        (pass-if-equal "open-input-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda () (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (let* ((port (open-input-file filename
+                                        #:guess-encoding #t
+                                        #:encoding "ISO-8859-15"))
+                 (str (read-line port)))
+            (close-port port)
+            str))
+
+        (pass-if-equal "call-with-input-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda () (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (call-with-input-file filename
+            read-line
+            #:guess-encoding #t
+            #:encoding "ISO-8859-15"))
+
+        (pass-if-equal "with-input-from-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda () (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (with-input-from-file filename
+            read-line
+            #:guess-encoding #t
+            #:encoding "ISO-8859-15")))
+
+      (delete-file filename))))
+
 ;;; ungetting characters and strings.
 (with-input-from-string "walk on the moon\nmoon"
                        (lambda ()
-- 
1.7.10.4

>From 3e9890ea809bd77a36c1005935b73cf4ca18e691 Mon Sep 17 00:00:00 2001
From: Mark H Weaver <address@hidden>
Date: Sun, 7 Apr 2013 12:07:33 -0400
Subject: [PATCH 2/2] Clarify 'file-encoding' docs: heuristics may be improved
 later.

* doc/ref/api-evaluation.texi (Character Encoding of Source Files):
  Mention UTF-8 as another common encoding used for Scheme source files,
  and that it is used by default.  Change the description to leave open
  the possibility of adding additional heuristics in the future.
  Mention that if the coding declaration is in a #!-style block comment,
  it must be the first such comment in the file.  Mention the
  '#:guess-encoding' keyword argument.
---
 doc/ref/api-evaluation.texi |   36 ++++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/doc/ref/api-evaluation.texi b/doc/ref/api-evaluation.texi
index 7afbcfa..63b1d60 100644
--- a/doc/ref/api-evaluation.texi
+++ b/doc/ref/api-evaluation.texi
@@ -991,17 +991,19 @@ three arguments.
 @cindex source file encoding
 @cindex primitive-load
 @cindex load
-Scheme source code files are usually encoded in ASCII, but, the
-built-in reader can interpret other character encodings.  The
-procedure @code{primitive-load}, and by extension the functions that
-call it, such as @code{load}, first scan the top 500 characters of the
-file for a coding declaration.
+Scheme source code files are usually encoded in ASCII or UTF-8, but the
+built-in reader can interpret other character encodings as well.  When
+Guile loads Scheme source code, it uses the @code{file-encoding}
+procedure (described below) to try to guess the encoding of the file.
+In the absence of any hints, UTF-8 is assumed.  One way to provide a
+hint about the encoding of a source file is to place a coding
+declaration in the top 500 characters of the file.
 
 A coding declaration has the form @code{coding: XXXXXX}, where
 @code{XXXXXX} is the name of a character encoding in which the source
 code file has been encoded.  The coding declaration must appear in a
-scheme comment.  It can either be a semicolon-initiated comment or a block
address@hidden comment.
+scheme comment.  It can either be a semicolon-initiated comment, or the
+first block @code{#!} comment in the file.
 
 The name of the character encoding in the coding declaration is
 typically lower case and containing only letters, numbers, and hyphens,
@@ -1050,15 +1052,21 @@ the port's character encoding should be set to the 
encoding returned
 by @code{file-encoding}, if any, again by using
 @code{set-port-encoding!}.  Then the code can be read as normal.
 
+Alternatively, one can use the @code{#:guess-encoding} keyword argument
+of @code{open-file} and related procedures.  @xref{File Ports}.
+
 @deffn {Scheme Procedure} file-encoding port
 @deffnx {C Function} scm_file_encoding (port)
-Scan the port for an Emacs-like character coding declaration near the
-top of the contents of a port with random-accessible contents
-(@pxref{Recognize Coding, how Emacs recognizes file encoding,, emacs,
-The GNU Emacs Reference Manual}).  The coding declaration is of the form
address@hidden: XXXXX} and must appear in a Scheme comment.  Return a
-string containing the character encoding of the file if a declaration
-was found, or @code{#f} otherwise.  The port is rewound.
+Attempt to scan the first few hundred bytes from the @var{port} for
+hints about its character encoding.  Return a string containing the
+encoding name or @code{#f} if the encoding cannot be determined.  The
+port is rewound.
+
+Currently, the only supported method is to look for an Emacs-like
+character coding declaration (@pxref{Recognize Coding, how Emacs
+recognizes file encoding,, emacs, The GNU Emacs Reference Manual}).  The
+coding declaration is of the form @code{coding: XXXXX} and must appear
+in a Scheme comment.  Additional heuristics may be added in the future.
 @end deffn
 
 
-- 
1.7.10.4


reply via email to

[Prev in Thread] Current Thread [Next in Thread]