[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Guile-commits] GNU Guile branch, stable-2.0, updated. v2.0.7-73-g419c87
From: |
Andy Wingo |
Subject: |
[Guile-commits] GNU Guile branch, stable-2.0, updated. v2.0.7-73-g419c873 |
Date: |
Wed, 30 Jan 2013 14:30:57 +0000 |
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU Guile".
http://git.savannah.gnu.org/cgit/guile.git/commit/?id=419c8736243e78a60020f5215ab223af6e9b7bb1
The branch, stable-2.0 has been updated
via 419c8736243e78a60020f5215ab223af6e9b7bb1 (commit)
via e10c250928bc6c4116d6344616d39f3c52edc36b (commit)
from 1260fd0b2c4ce1d0d7e7b17df924c245f67f9058 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
commit 419c8736243e78a60020f5215ab223af6e9b7bb1
Author: Andy Wingo <address@hidden>
Date: Wed Jan 30 15:30:31 2013 +0100
Revert "detect and consume byte-order marks for textual ports"
This reverts commit b2cb557d75e4daf8c7c8cd43313f4cc51d9a3f1b, which was
pushed accidentally.
commit e10c250928bc6c4116d6344616d39f3c52edc36b
Author: Andy Wingo <address@hidden>
Date: Wed Jan 30 15:29:18 2013 +0100
add #:doctype-handler to xml->sxml
* module/sxml/simple.scm (read-internal-doctype-as-string): New helper.
(xml->sxml): Add #:doctype-handler argument.
* doc/ref/sxml.texi (Reading and Writing XML): Document
#:doctype-handler. Fix some other examples, and fix the default value
of #:declare-namespaces?.
* test-suite/tests/sxml.simple.test: Add all tests from the manual
here.
-----------------------------------------------------------------------
Summary of changes:
doc/ref/sxml.texi | 52 +++++++++++++++++++----
libguile/fports.c | 35 +++++++---------
libguile/load.c | 3 -
libguile/ports.c | 85 +------------------------------------
libguile/ports.h | 3 +-
libguile/read.c | 14 +++++-
module/sxml/simple.scm | 56 +++++++++++++++++++------
test-suite/tests/filesys.test | 59 +-------------------------
test-suite/tests/sxml.simple.test | 85 ++++++++++++++++++++++++++++++++++++-
9 files changed, 200 insertions(+), 192 deletions(-)
diff --git a/doc/ref/sxml.texi b/doc/ref/sxml.texi
index 66584bf..6dc261f 100644
--- a/doc/ref/sxml.texi
+++ b/doc/ref/sxml.texi
@@ -57,7 +57,8 @@ to text.
@deffn {Scheme Procedure} xml->sxml [string-or-port] [#:namespaces='()] @
[#:declare-namespaces?=#t] [#:trim-whitespace?=#f] @
- [#:entities='()] [#:default-entity-handler=#f]
+ [#:entities='()] [#:default-entity-handler=#f] @
+ [#:doctype-handler=#f]
Use SSAX to parse an XML document into SXML. Takes one optional
argument, @var{string-or-port}, which defaults to the current input
port. Returns the resulting SXML document. If @var{string-or-port} is
@@ -99,18 +100,19 @@ for certain namespaces with the @code{#:namespaces}
keyword argument to
@result{} (*TOP* (foo (ns2:baz)))
@end example
-Passing a true @code{#:declare-namespaces?} argument will cause the
-user-given @code{#:namespaces} to be treated as if they were declared on
-the root element.
+By default, namespaces passed to @code{xml->sxml} are treated as if they
+were declared on the root element. Passing a false
address@hidden:declare-namespaces?} argument will disable this behavior,
+requiring in-document declarations of namespaces before use..
@example
(xml->sxml "<foo><ns2:baz/></foo>"
#:namespaces '((ns2 . "http://example.org/ns2")))
address@hidden error: undeclared namespace: `bar'
address@hidden (*TOP* (foo (ns2:baz)))
(xml->sxml "<foo><ns2:baz/></foo>"
#:namespaces '((ns2 . "http://example.org/ns2"))
- #:declare-namespaces? #t)
address@hidden (*TOP* (foo (ns2:baz)))
+ #:declare-namespaces? #f)
address@hidden error: undeclared namespace: `bar'
@end example
By default, all whitespace in XML is significant. Passing the
@@ -120,10 +122,10 @@ whitespace in front, behind and between elements,
treating it as
@example
(xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>")
address@hidden (*TOP* (foo "\n" (bar " Alfie the parrot! ") "\n")
address@hidden (*TOP* (foo "\n" (bar " Alfie the parrot! ") "\n"))
(xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>"
#:trim-whitespace? #t)
address@hidden (*TOP* (foo (bar " Alfie the parrot! "))
address@hidden (*TOP* (foo (bar " Alfie the parrot! ")))
@end example
Parsed entities may be declared with the @code{#:entities} keyword
@@ -159,6 +161,38 @@ numeric character entities.
@result{} (*TOP* (foo "\xa0 foo"))
@end example
+By default, @code{xml->sxml} skips over the @code{<!DOCTYPE>}
+declaration, if any. This behavior can be overridden with the
address@hidden:doctype-handler} argument, which should be a procedure of three
+arguments: the @dfn{docname} (a symbol), @dfn{systemid} (a string), and
+the internal doctype subset (as a string or @code{#f} if not present).
+
+The handler should return keyword arguments as multiple values, as if it
+were calling its continuation with keyword arguments. The continuation
+accepts the @code{#:entities} and @code{#:namespaces} keyword arguments,
+in the same format that @code{xml->sxml} itself takes. These entities
+and namespaces will be prepended to those given to the @code{xml->sxml}
+invocation.
+
address@hidden
+(define (handle-foo docname systemid internal-subset)
+ (case docname
+ ((foo)
+ (values #:entities '((greets . "<i>Hello, world!</i>"))))
+ (else
+ (values))))
+
+(xml->sxml "<!DOCTYPE foo><p>&greets;</p>"
+ #:doctype-handler handle-foo)
address@hidden (*TOP* (p (i "Hello, world!")))
address@hidden example
+
+If the document has no doctype declaration, the @var{doctype-handler} is
+invoked with @code{#f} for the three arguments.
+
+In the future, the continuation may accept other keyword arguments, for
+example to validate the parsed SXML against the doctype.
+
@deffn {Scheme Procedure} sxml->xml tree [port]
Serialize the SXML tree @var{tree} as XML. The output will be written to
the current output port, unless the optional argument @var{port} is
diff --git a/libguile/fports.c b/libguile/fports.c
index fbc0530..10cf671 100644
--- a/libguile/fports.c
+++ b/libguile/fports.c
@@ -1,5 +1,5 @@
/* Copyright (C) 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003,
- * 2004, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013 Free Software
Foundation, Inc.
+ * 2004, 2006, 2007, 2008, 2009, 2010, 2011, 2012 Free Software Foundation,
Inc.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public License
@@ -399,7 +399,7 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
#define FUNC_NAME s_scm_open_file
{
SCM port;
- int fdes, flags = 0, scan_for_encoding = 0, consume_bom = 0, binary = 0;
+ int fdes, flags = 0, use_encoding = 1;
unsigned int retries;
char *file, *md, *ptr;
@@ -415,8 +415,6 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
{
case 'r':
flags |= O_RDONLY;
- consume_bom = 1;
- scan_for_encoding = 1;
break;
case 'w':
flags |= O_WRONLY | O_CREAT | O_TRUNC;
@@ -434,12 +432,9 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
{
case '+':
flags = (flags & ~(O_RDONLY | O_WRONLY)) | O_RDWR;
- consume_bom = 0;
break;
case 'b':
- scan_for_encoding = 0;
- consume_bom = 0;
- binary = 1;
+ use_encoding = 0;
#if defined (O_BINARY)
flags |= O_BINARY;
#endif
@@ -478,21 +473,21 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
port = scm_i_fdes_to_port (fdes, scm_i_mode_bits (mode),
fport_canonicalize_filename (filename));
- if (consume_bom)
- scm_consume_byte_order_mark (port);
-
- if (binary)
+ if (use_encoding)
+ {
+ /* If this file has a coding declaration, use that as the port
+ encoding. */
+ if (SCM_INPUT_PORT_P (port))
+ {
+ char *enc = scm_i_scan_for_encoding (port);
+ if (enc != NULL)
+ scm_i_set_port_encoding_x (port, enc);
+ }
+ }
+ else
/* If this is a binary file, use the binary-friendly ISO-8859-1
encoding. */
scm_i_set_port_encoding_x (port, NULL);
- else if (scan_for_encoding)
- /* If this is an input port and the file has a coding declaration,
- use that as the port encoding. */
- {
- char *enc = scm_i_scan_for_encoding (port);
- if (enc != NULL)
- scm_i_set_port_encoding_x (port, enc);
- }
scm_dynwind_end ();
diff --git a/libguile/load.c b/libguile/load.c
index 476461c..84b6705 100644
--- a/libguile/load.c
+++ b/libguile/load.c
@@ -106,9 +106,6 @@ SCM_DEFINE (scm_primitive_load, "primitive-load", 1, 0, 0,
scm_dynwind_begin (SCM_F_DYNWIND_REWINDABLE);
scm_i_dynwind_current_load_port (port);
- /* FIXME: For better or for worse, scm_open_file already scans the
- file for an encoding. This scans again; necessary for this
- logic, but unnecessary overall. */
encoding = scm_i_scan_for_encoding (port);
if (encoding)
scm_i_set_port_encoding_x (port, encoding);
diff --git a/libguile/ports.c b/libguile/ports.c
index 9b1be9b..55808e2 100644
--- a/libguile/ports.c
+++ b/libguile/ports.c
@@ -1,5 +1,5 @@
/* Copyright (C) 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2003, 2004,
- * 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013 Free Software Foundation,
Inc.
+ * 2006, 2007, 2008, 2009, 2010, 2011, 2012 Free Software Foundation, Inc.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public License
@@ -2153,89 +2153,6 @@ SCM_DEFINE (scm_set_port_filename_x,
"set-port-filename!", 2, 0, 0,
}
#undef FUNC_NAME
-SCM_DEFINE (scm_consume_byte_order_mark, "consume-byte-order-mark", 1, 0, 0,
- (SCM port),
- "Peek ahead in @var{port} for a byte-order mark (\\uFEFF)
encoded\n"
- "in UTF-8 or in UTF-16. If found, consume the byte-order mark\n"
- "and set the port to the indicated encoding.\n"
- "\n"
- "As a special case, if the port's encoding is already UTF-16LE\n"
- "or UTF-16BE (as opposed to UTF-16), we consider that the user\n"
- "has already asked for an explicit byte order. In this case no\n"
- "scan is performed, and the byte-order mark (if any) is left in\n"
- "the port.\n"
- "\n"
- "Return @code{#t} if a byte-order mark was consumed, and\n"
- "@code{#f} otherwise.")
-#define FUNC_NAME s_scm_consume_byte_order_mark
-{
- scm_t_port *pt;
- const char *enc;
-
- SCM_VALIDATE_PORT (1, port);
-
- pt = SCM_PTAB_ENTRY (port);
- enc = pt->encoding;
-
- if (enc && strcasecmp (enc, "UTF-16BE") == 0)
- return SCM_BOOL_F;
-
- if (enc && strcasecmp (enc, "UTF-16LE") == 0)
- return SCM_BOOL_F;
-
- switch (scm_peek_byte_or_eof (port))
- {
- case 0xEF:
- scm_get_byte_or_eof (port);
- switch (scm_peek_byte_or_eof (port))
- {
- case 0xBB:
- scm_get_byte_or_eof (port);
- switch (scm_peek_byte_or_eof (port))
- {
- case 0xBF:
- scm_get_byte_or_eof (port);
- scm_i_set_port_encoding_x (port, "UTF-8");
- return SCM_BOOL_T;
- default:
- scm_unget_byte (0xBB, port);
- scm_unget_byte (0xEF, port);
- return SCM_BOOL_F;
- }
- default:
- scm_unget_byte (0xEF, port);
- return SCM_BOOL_F;
- }
- case 0xFE:
- scm_get_byte_or_eof (port);
- switch (scm_peek_byte_or_eof (port))
- {
- case 0xFF:
- scm_get_byte_or_eof (port);
- scm_i_set_port_encoding_x (port, "UTF-16BE");
- return SCM_BOOL_T;
- default:
- scm_unget_byte (0xFE, port);
- return SCM_BOOL_F;
- }
- case 0xFF:
- scm_get_byte_or_eof (port);
- switch (scm_peek_byte_or_eof (port))
- {
- case 0xFE:
- scm_get_byte_or_eof (port);
- scm_i_set_port_encoding_x (port, "UTF-16LE");
- return SCM_BOOL_T;
- default:
- scm_unget_byte (0xFF, port);
- return SCM_BOOL_F;
- }
- default:
- return SCM_BOOL_F;
- }
-}
-#undef FUNC_NAME
-
/* A fluid specifying the default encoding for newly created ports. If it is
a string, that is the encoding. If it is #f, it is in the "native"
(Latin-1) encoding. */
diff --git a/libguile/ports.h b/libguile/ports.h
index 2f32345..d4d59b7 100644
--- a/libguile/ports.h
+++ b/libguile/ports.h
@@ -4,7 +4,7 @@
#define SCM_PORTS_H
/* Copyright (C) 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2003, 2004,
- * 2006, 2008, 2009, 2010, 2011, 2012, 2013 Free Software Foundation, Inc.
+ * 2006, 2008, 2009, 2010, 2011, 2012 Free Software Foundation, Inc.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public License
@@ -303,7 +303,6 @@ SCM_API SCM scm_port_column (SCM port);
SCM_API SCM scm_set_port_column_x (SCM port, SCM line);
SCM_API SCM scm_port_filename (SCM port);
SCM_API SCM scm_set_port_filename_x (SCM port, SCM filename);
-SCM_API SCM scm_consume_byte_order_mark (SCM port);
SCM_INTERNAL const char *scm_i_default_port_encoding (void);
SCM_INTERNAL void scm_i_set_default_port_encoding (const char *);
SCM_INTERNAL void scm_i_set_port_encoding_x (SCM port, const char *str);
diff --git a/libguile/read.c b/libguile/read.c
index a8f7744..222891b 100644
--- a/libguile/read.c
+++ b/libguile/read.c
@@ -1,5 +1,5 @@
/* Copyright (C) 1995, 1996, 1997, 1999, 2000, 2001, 2003, 2004, 2006,
- * 2007, 2008, 2009, 2010, 2011, 2012, 2013 Free Software Foundation, Inc.
+ * 2007, 2008, 2009, 2010, 2011, 2012 Free Software Foundation, Inc.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public License
@@ -1985,6 +1985,7 @@ scm_i_scan_for_encoding (SCM port)
char header[SCM_ENCODING_SEARCH_SIZE+1];
size_t bytes_read, encoding_length, i;
char *encoding = NULL;
+ int utf8_bom = 0;
char *pos, *encoding_start;
int in_comment;
@@ -2026,9 +2027,13 @@ scm_i_scan_for_encoding (SCM port)
bytes_read = scm_c_read (port, header, SCM_ENCODING_SEARCH_SIZE);
header[bytes_read] = '\0';
- scm_seek (port, scm_from_int (-bytes_read), scm_from_int (SEEK_CUR));
+ scm_seek (port, scm_from_int (0), scm_from_int (SEEK_SET));
}
+ if (bytes_read > 3
+ && header[0] == '\xef' && header[1] == '\xbb' && header[2] == '\xbf')
+ utf8_bom = 1;
+
/* search past "coding[:=]" */
pos = header;
while (1)
@@ -2097,6 +2102,11 @@ scm_i_scan_for_encoding (SCM port)
/* This wasn't in a comment */
return NULL;
+ if (utf8_bom && strcmp(encoding, "UTF-8"))
+ scm_misc_error (NULL,
+ "the port input declares the encoding ~s but is encoded as
UTF-8",
+ scm_list_1 (scm_from_locale_string (encoding)));
+
return encoding;
}
diff --git a/module/sxml/simple.scm b/module/sxml/simple.scm
index 606975d..703ad91 100644
--- a/module/sxml/simple.scm
+++ b/module/sxml/simple.scm
@@ -26,6 +26,7 @@
;;; Code:
(define-module (sxml simple)
+ #:use-module (sxml ssax input-parse)
#:use-module (sxml ssax)
#:use-module (sxml transform)
#:use-module (ice-9 match)
@@ -35,10 +36,6 @@
;; Helpers from upstream/SSAX.scm.
;;
-(define (ssax:warn port msg . args)
- (format (current-ssax-error-port)
- ";;; SSAX warning: ~a ~a\n" msg args))
-
; ssax:reverse-collect-str LIST-OF-FRAGS -> LIST-OF-FRAGS
; given the list of fragments (some of which are text strings)
; reverse the list and concatenate adjacent text strings.
@@ -65,6 +62,17 @@
(cons (string-concatenate/shared strs) result)))
'())))))))
+(define (read-internal-doctype-as-string port)
+ (string-concatenate/shared
+ (let loop ()
+ (let ((fragment
+ (next-token '() '(#\]) "reading internal DOCTYPE" port)))
+ (if (eqv? #\> (peek-next-char port))
+ (begin
+ (read-char port)
+ (cons fragment '()))
+ (cons* fragment "]" (loop)))))))
+
;; Ideas for the future for this interface:
;;
;; * Allow doctypes to provide parsed entities
@@ -81,7 +89,8 @@
(declare-namespaces? #t)
(trim-whitespace? #f)
(entities '())
- (default-entity-handler #f))
+ (default-entity-handler #f)
+ (doctype-handler #f))
"Use SSAX to parse an XML document into SXML. Takes one optional
argument, @var{string-or-port}, which defaults to the current input
port."
@@ -96,7 +105,7 @@ port."
;; NAMESPACES: list of (DOC-PREFIX . (USER-PREFIX . URI)).
;; A DOC-PREFIX of #f indicates that it comes from the user.
;; Otherwise, prefixes are symbols.
- (define (user-namespaces)
+ (define (munge-namespaces namespaces)
(map (lambda (el)
(match el
((prefix . uri-string)
@@ -105,6 +114,9 @@ port."
(ssax:uri-string->symbol uri-string)))))
namespaces))
+ (define (user-namespaces)
+ (munge-namespaces namespaces))
+
(define (user-entities)
(if (and default-entity-handler
(not (assq '*DEFAULT* entities)))
@@ -117,6 +129,13 @@ port."
(symbol-append prefix (string->symbol ":") local-part))
(_ name)))
+ (define (doctype-continuation seed)
+ (lambda* (#:key (entities '()) (namespaces '()))
+ (values #f
+ (append entities (user-entities))
+ (append (munge-namespaces namespaces) (user-namespaces))
+ seed)))
+
;; The SEED in this parser is the SXML: initialized to '() at each new
;; level by the fdown handlers; built in reverse by the fhere parsers;
;; and reverse-collected by the fup handlers.
@@ -159,18 +178,29 @@ port."
;;
;; SEED builds up the content.
(lambda (port docname systemid internal-subset? seed)
- (when internal-subset?
- (ssax:warn port "Internal DTD subset is not currently handled ")
- (ssax:skip-internal-dtd port))
- (ssax:warn port "DOCTYPE DECL " docname " "
- systemid " found and skipped")
- (values #f (user-entities) (user-namespaces) seed))
+ (call-with-values
+ (lambda ()
+ (cond
+ (doctype-handler
+ (doctype-handler docname systemid
+ (and internal-subset?
+ (read-internal-doctype-as-string port))))
+ (else
+ (when internal-subset?
+ (ssax:skip-internal-dtd port))
+ (values))))
+ (doctype-continuation seed)))
UNDECL-ROOT
;; This is like the DOCTYPE handler, but for documents that do not
;; have a <!DOCTYPE!> entry.
(lambda (elem-gi seed)
- (values #f (user-entities) (user-namespaces) seed))
+ (call-with-values
+ (lambda ()
+ (if doctype-handler
+ (doctype-handler #f #f #f)
+ (values)))
+ (doctype-continuation seed)))
PI
((*DEFAULT*
diff --git a/test-suite/tests/filesys.test b/test-suite/tests/filesys.test
index 8bd974d..a6bfb6e 100644
--- a/test-suite/tests/filesys.test
+++ b/test-suite/tests/filesys.test
@@ -1,6 +1,6 @@
;;;; filesys.test --- test file system functions -*- scheme -*-
;;;;
-;;;; Copyright (C) 2004, 2006, 2013 Free Software Foundation, Inc.
+;;;; Copyright (C) 2004, 2006 Free Software Foundation, Inc.
;;;;
;;;; This library is free software; you can redistribute it and/or
;;;; modify it under the terms of the GNU Lesser General Public
@@ -17,8 +17,6 @@
;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
USA
(define-module (test-suite test-filesys)
- #:use-module (ice-9 rdelim)
- #:use-module (ice-9 binary-ports)
#:use-module (test-suite lib)
#:use-module (test-suite guile-test))
@@ -129,58 +127,3 @@
(delete-file (test-file))
(delete-file (test-symlink))
-
-(let ((s "\ufeffHello, world!"))
- (define* (test-encoding encoding #:optional (ambient "ISO-8859-1"))
- (with-fluids ((%default-port-encoding ambient))
- (let* ((bytes (catch 'misc-error
- (lambda ()
- (call-with-values open-bytevector-output-port
- (lambda (port get-bytevector)
- (set-port-encoding! port encoding)
- (display s port)
- (get-bytevector))))
- (lambda args
- (throw 'unresolved))))
- (name (string-copy "myfile-XXXXXX"))
- (port (mkstemp! name)))
- (put-bytevector port bytes)
- (close-port port)
- (let ((contents (call-with-input-file name read-string)))
- (delete-file name)
- contents))))
-
- (pass-if "UTF-8"
- (equal? (test-encoding "UTF-8")
- "Hello, world!"))
-
- (pass-if "UTF-16BE"
- (equal? (test-encoding "UTF-16BE")
- "Hello, world!"))
-
- (pass-if "UTF-16LE"
- (equal? (test-encoding "UTF-16LE")
- "Hello, world!"))
-
- (pass-if "UTF-8 (ambient)"
- (equal? (test-encoding "UTF-8" "UTF-8")
- "Hello, world!"))
-
- (pass-if "UTF-8 (UTF-16 ambient)"
- (equal? (test-encoding "UTF-8" "UTF-16")
- "Hello, world!"))
-
- ;; Unicode 6.2 section 16.8:
- ;;
- ;; For compatibility with versions of the Unicode Standard prior to
- ;; Version 3.2, the code point U+FEFF has the word-joining semantics
- ;; of zero width no-break space when it is not used as a BOM. [...]
- ;;
- ;; Where the byte order is explicitly specified, such as in UTF-16BE
- ;; or UTF-16LE, then all U+FEFF characters -- even at the very
- ;; beginning of the text -- are to be interpreted as zero width
- ;; no-break spaces.
- ;;
- (pass-if "UTF-16LE (ambient)"
- (equal? (test-encoding "UTF-16LE" "UTF-16LE")
- "\ufeffHello, world!")))
diff --git a/test-suite/tests/sxml.simple.test
b/test-suite/tests/sxml.simple.test
index 623f13e..e52ba31 100644
--- a/test-suite/tests/sxml.simple.test
+++ b/test-suite/tests/sxml.simple.test
@@ -1,6 +1,6 @@
;;;; sxml.simple.test --- (sxml simple) -*- mode: scheme; coding: utf-8; -*-
;;;;
-;;;; Copyright (C) 2010 Free Software Foundation, Inc.
+;;;; Copyright (C) 2010, 2013 Free Software Foundation, Inc.
;;;;
;;;; This library is free software; you can redistribute it and/or
;;;; modify it under the terms of the GNU Lesser General Public
@@ -20,6 +20,8 @@
#:use-module (test-suite lib)
#:use-module (sxml simple))
+(define parser-error '(parser-error . ""))
+
(define %xml-sample
;; An XML sample without any space in between tags, to make it easier.
(string-append "<?xml version='1.0' encoding='utf-8'?>"
@@ -50,3 +52,84 @@
(lambda ()
(sxml->xml
(xml->sxml (open-input-string %xml-sample))))))))))
+
+(with-test-prefix "namespaces"
+ (pass-if-equal
+ (xml->sxml "<foo xmlns=\"http://example.org/ns1\">text</foo>")
+ '(*TOP* (http://example.org/ns1:foo "text")))
+
+ (pass-if-equal
+ (xml->sxml "<foo xmlns=\"http://example.org/ns1\">text</foo>"
+ #:namespaces '((ns1 . "http://example.org/ns1")))
+ '(*TOP* (ns1:foo "text")))
+
+ (pass-if-equal
+ (xml->sxml "<foo xmlns:bar=\"http://example.org/ns2\"><bar:baz/></foo>"
+ #:namespaces '((ns2 . "http://example.org/ns2")))
+ '(*TOP* (foo (ns2:baz))))
+
+ (pass-if-equal
+ (xml->sxml "<foo><ns2:baz/></foo>"
+ #:namespaces '((ns2 . "http://example.org/ns2")))
+ '(*TOP* (foo (ns2:baz))))
+
+ (pass-if-exception "namespace undeclared" parser-error
+ (xml->sxml "<foo><ns2:baz/></foo>"
+ #:namespaces '((ns2 . "http://example.org/ns2"))
+ #:declare-namespaces? #f)))
+
+(with-test-prefix "whitespace"
+ (pass-if-equal
+ (xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>")
+ '(*TOP* (foo "\n" (bar " Alfie the parrot! ") "\n")))
+
+ (pass-if-equal
+ (xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>"
+ #:trim-whitespace? #t)
+ '(*TOP* (foo (bar " Alfie the parrot! ")))))
+
+(with-test-prefix "parsed entities"
+ (pass-if-equal
+ '(*TOP* (foo "&"))
+ (xml->sxml "<foo>&</foo>"))
+
+ (pass-if-exception "nbsp undefined" parser-error
+ (xml->sxml "<foo> </foo>"))
+
+ (pass-if-equal
+ '(*TOP* (foo "\xA0"))
+ (xml->sxml "<foo> </foo>"
+ #:entities '((nbsp . "\xA0"))))
+
+ (pass-if-equal
+ '(*TOP* (foo "\xA0"))
+ (xml->sxml "<foo> </foo>"))
+
+ (let ((ents '()))
+ (pass-if-equal
+ (xml->sxml "<foo> &foo;</foo>"
+ #:default-entity-handler
+ (lambda (port name)
+ (case name
+ ((nbsp) "\xa0")
+ (else
+ (set! ents (cons name ents))
+ "qux"))))
+ '(*TOP* (foo "\xa0 qux")))
+
+ (pass-if-equal
+ ents
+ '(foo))))
+
+(with-test-prefix "doctype handlers"
+ (define (handle-foo docname systemid internal-subset)
+ (case docname
+ ((foo)
+ (values #:entities '((greets . "<i>Hello, world!</i>"))))
+ (else
+ (values))))
+
+ (pass-if-equal
+ (xml->sxml "<!DOCTYPE foo><p>&greets;</p>"
+ #:doctype-handler handle-foo)
+ '(*TOP* (p (i "Hello, world!")))))
hooks/post-receive
--
GNU Guile
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Guile-commits] GNU Guile branch, stable-2.0, updated. v2.0.7-73-g419c873,
Andy Wingo <=