guile-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Guile-commits] GNU Guile branch, master, updated. release_1-9-2-152-gbe


From: Michael Gran
Subject: [Guile-commits] GNU Guile branch, master, updated. release_1-9-2-152-gbe3eb25
Date: Thu, 03 Sep 2009 16:11:06 +0000

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU Guile".

http://git.savannah.gnu.org/cgit/guile.git/commit/?id=be3eb25c64eeda81eeaf1356362e0eee9b5b02fb

The branch, master has been updated
       via  be3eb25c64eeda81eeaf1356362e0eee9b5b02fb (commit)
       via  bb15a36c25cc2dd7e9d3ea30b8bb6b99beed97d5 (commit)
       via  ba8477eccefae65191a5bc1de2b6f923fe195a91 (commit)
       via  719bb8cd5db10aeb0dad1c16227d6b6abc40e8b6 (commit)
       via  0dcd7e61534c9d1e33de904196cb505daf320a42 (commit)
      from  aa2cba9c882ba8bd69750b120d2b7ccd7250b562 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit be3eb25c64eeda81eeaf1356362e0eee9b5b02fb
Author: Michael Gran <address@hidden>
Date:   Thu Sep 3 09:03:53 2009 -0700

    Doc updates for srfi-14 character sets
    
    * NEWS: updates for srfi-14 character sets
    
    * doc/ref/api-data.texi: update char-set section and some spellchecking

commit bb15a36c25cc2dd7e9d3ea30b8bb6b99beed97d5
Author: Michael Gran <address@hidden>
Date:   Thu Sep 3 08:48:23 2009 -0700

    Update docs and docstrings for Unicode characters
    
    * doc/ref/api-data.texi: more info about characters and codepoints
    
    * libguile/chars.c: replace 'code point' with 'Unicode code point' in
      docstrings

commit ba8477eccefae65191a5bc1de2b6f923fe195a91
Author: Michael Gran <address@hidden>
Date:   Thu Sep 3 08:29:45 2009 -0700

    Add char-set debugging function
    
    * libguile/srfi-14.c (scm_sys_char_set_dump): new function
    
    * libguile/srfi-14.h: declaration of scm_sys_char_set_dump

commit 719bb8cd5db10aeb0dad1c16227d6b6abc40e8b6
Author: Michael Gran <address@hidden>
Date:   Thu Sep 3 08:23:24 2009 -0700

    Distinguish between all codepoints and designated codepoints in char-sets
    
    * libguile/unidata_to_charset.pl (designated): renamed from full
    
    * libguile/srfi-14.c (scm_char_set_designated): new char-set
    
    * libguile/srfi-14.i.c (cs_designated): renamed from cs_full

commit 0dcd7e61534c9d1e33de904196cb505daf320a42
Author: Michael Gran <address@hidden>
Date:   Thu Sep 3 07:47:26 2009 -0700

    Modify read and print of combining characters
    
    Since combining characters, such as accents, modify the appearance of the
    previous letter, it looks awkward in its character literal form (#\name)
    since it modified the backslash.  This instead prints the combining
    character on a small circle.
    
    * libguile/chars.h (SCM_CODEPOINT_DOTTED_CIRCLE): new #define
    
    * libguile/print.c (iprint1): print combining characters on dotted circles
    
    * libguile/read.c (scm_read_character): parse the combination of combining
      characters and dotted circles

-----------------------------------------------------------------------

Summary of changes:
 NEWS                           |    7 ++
 doc/ref/api-data.texi          |  195 +++++++++++++++++++++++++++-------------
 libguile/chars.c               |   44 +++++-----
 libguile/chars.h               |    3 +-
 libguile/print.c               |   17 +++-
 libguile/read.c                |    3 +
 libguile/srfi-14.c             |   80 +++++++++++++----
 libguile/srfi-14.h             |    4 +-
 libguile/srfi-14.i.c           |    6 +-
 libguile/unidata_to_charset.pl |    6 +-
 10 files changed, 252 insertions(+), 113 deletions(-)

diff --git a/NEWS b/NEWS
index 97b55e9..955075b 100644
--- a/NEWS
+++ b/NEWS
@@ -10,6 +10,13 @@ prerelease, and a full NEWS corresponding to 1.8 -> 2.0.)
 
 Changes in 1.9.3 (since the 1.9.2 prerelease):
 
+** SRFI-14 char-sets are modified for Unicode
+
+The default char-sets are not longer locale dependent and contain
+characters from the whole Unicode range.  There is a new char-set,
+char-set:designated, which contains all assigned Unicode characters.
+There is a new debugging function: %char-set-dump.
+
 ** Character functions operate on Unicode characters
 
 char-upcase and char-downcase use default Unicode casing rules.
diff --git a/doc/ref/api-data.texi b/doc/ref/api-data.texi
index 446ccd3..5cbf4b1 100755
--- a/doc/ref/api-data.texi
+++ b/doc/ref/api-data.texi
@@ -539,7 +539,7 @@ error.  Instead, the result of the division is either plus 
or minus
 infinity, depending on the sign of the divided number.
 
 The infinities are written @samp{+inf.0} and @samp{-inf.0},
-respectivly.  This syntax is also recognized by @code{read} as an
+respectively.  This syntax is also recognized by @code{read} as an
 extension to the usual Scheme syntax.
 
 Dividing zero by zero yields something that is not a number at all:
@@ -637,7 +637,7 @@ magnitude.  The argument @var{val} must be a real number.
 @end deftypefn
 
 @deftypefn {C Function} SCM scm_from_double (double val)
-Return the @code{SCM} value that representats @var{val}.  The returned
+Return the @code{SCM} value that represents @var{val}.  The returned
 value is inexact according to the predicate @code{inexact?}, but it
 will be exactly equal to @var{val}.
 @end deftypefn
@@ -1782,24 +1782,59 @@ another manual.
 In Scheme, there is a data type to describe a single character.  
 
 Defining what exactly a character @emph{is} can be more complicated
-than it seems.  Guile follows the advice of R6RS and just uses The
-Unicode Standard to help define what a character is.  So, for Guile,
-a character is anything in the Unicode Character Database.
-
-Unicode assigns each character an unique integer representation: a
address@hidden point}.  Guile uses Unicode code points as the integer
-representation of characters.  Valid code points are in the ranges 0
-to @code{#xD7FF} inclusive or @code{#xE000} to @code{#x10FFFF}
-inclusive.
+than it seems.  Guile follows the advice of R6RS and uses The Unicode
+Standard to help define what a character is.  So, for Guile, a
+character is anything in the Unicode Character Database.
+
address@hidden code point
address@hidden Unicode code point
+
+The Unicode Character Database is basically a table of characters
+indexed using integers called 'code points'.  Valid code points are in
+the ranges 0 to @code{#xD7FF} inclusive or @code{#xE000} to
address@hidden inclusive, which is about 1.1 million code points.
+
address@hidden designated code point
address@hidden code point, designated
+
+Any code point that has been assigned to a character or that has
+otherwise been given a meaning by Unicode is called a 'designated code
+point'.  Most of the designated code points, about 200,000 of them,
+indicate characters, accents or other combining marks that modify
+other characters, symbols, whitespace, and control characters.  Some
+are not characters but indicators that suggest how to format or
+display neighboring characters.
+
address@hidden reserved code point
address@hidden code point, reserved
+
+If a code point is not a designated code point -- if it has not been
+assigned to a character by The Unicode Standard -- it is a 'reserved
+code point', meaning that they are reserved for future use.  Most of
+the code points, about 800,000, are 'reserved code points'.
+
+By convention, a Unicode code point is written as
+``U+XXXX'' where ``XXXX'' is a hexadecimal number.  Please note that
+this convenient notation is not valid code.  Guile does not interpret
+``U+XXXX'' as a character.
 
 In Scheme, a character literal is written as @address@hidden where
 @var{name} is the name of the character that you want.  Printable
 characters have their usual single character name; for example,
address@hidden is a lower case @code{a}.  Many of the non-printing
-characters, such as whitespace characters and control characters, also
-have names.
address@hidden is a lower case @code{a}.  
+
+Some of the code points are 'combining characters' that are not meant
+to be printed by themselves but are instead meant to modify the
+appearance of the previous character.  For combining characters, an
+alternate form of the character literal is @code{#\} followed by
+U+25CC (a small, dotted circle), followed by the combining character.
+This allows the combining character to be drawn on the circle, not on
+the backslash of @code{#\}.
 
-The most commonly used non-printing chararacters are space and
+Many of the non-printing characters, such as whitespace characters and
+control characters, also have names.
+
+The most commonly used non-printing characters are space and
 newline.  Their character names are @code{#\space} and
 @code{#\newline}.  There are also names for all of the ``C0 control
 characters'' (those with code points below 32).  The following table
@@ -1841,7 +1876,7 @@ describes the names for each character.
 @item 32 = @code{#\sp}
 @end multitable
 
-The ``delete'' character (code point 127) may be referred to with the
+The ``delete'' character (code point U+007F) may be referred to with the
 name @code{#\del}.
 
 One might note that the space character has two names --
@@ -1862,8 +1897,9 @@ sake of compatibility with previous versions.
 @item @code{#\null} @tab @code{#\nul}
 @end multitable
 
-Characters may also be referred to with an octal value, such as
address@hidden for @code{#\bs} or @code{#\177} for @code{#\del}.
+Characters may also be written using their code point values.  They can
+be written with as an octal number, such as @code{#\10} for
address@hidden or @code{#\177} for @code{#\del}.
 
 @rnindex char?
 @deffn {Scheme Procedure} char? x
@@ -1871,7 +1907,7 @@ Characters may also be referred to with an octal value, 
such as
 Return @code{#t} iff @var{x} is a character, else @code{#f}.
 @end deffn
 
-Fundamentally, the character comparisons operations below are
+Fundamentally, the character comparison operations below are
 numeric comparisons of the character's code points.
 
 @rnindex char=?
@@ -1904,12 +1940,17 @@ Return @code{#t} iff the code point of @var{x} is 
greater than or
 equal to the code point of @var{y}, else @code{#f}.
 @end deffn
 
-Case-insensitive character comparisons of characters use @emph{Unicode
-case folding}.  In case folding comparisons, if a character is
-lowercase and has an uppercase form that can be expressed as a single
-character, it is converted to uppercase before comparison.  Unicode
-case folding is language independent: it uses rules that are generally
-true, but, it cannot cover all cases for all languages.
address@hidden case folding
+
+Case-insensitive character comparisons use @emph{Unicode case
+folding}.  In case folding comparisons, if a character is lowercase
+and has an uppercase form that can be expressed as a single character,
+it is converted to uppercase before comparison.  All other characters
+undergo no conversion before the comparison occurs.  This includes the
+German sharp S (Eszett) which is not uppercased before conversion
+because its uppercase form has two characters.  Unicode case folding
+is language independent: it uses rules that are generally true, but,
+it cannot cover all cases for all languages.
 
 @rnindex char-ci=?
 @deffn {Scheme Procedure} char-ci=? x y
@@ -2018,12 +2059,6 @@ handling them are provided.
 Character sets can be created, extended, tested for the membership of a
 characters and be compared to other character sets.
 
-The Guile implementation of character sets currently deals only with
-8-bit characters.  In the future, when Guile gets support for
-international character sets, this will change, but the functions
-provided here will always then be able to efficiently cope with very
-large character sets.
-
 @menu
 * Character Set Predicates/Comparison::
 * Iterating Over Character Sets::  Enumerate charset elements.
@@ -2222,7 +2257,7 @@ character codes lie in the half-open range
 If @var{error} is a true value, an error is signalled if the
 specified range contains characters which are not contained in
 the implemented character range.  If @var{error} is @code{#f},
-these characters are silently left out of the resultung
+these characters are silently left out of the resulting
 character set.
 
 The characters in @var{base_cs} are added to the result, if
@@ -2238,7 +2273,7 @@ character codes lie in the half-open range
 If @var{error} is a true value, an error is signalled if the
 specified range contains characters which are not contained in
 the implemented character range.  If @var{error} is @code{#f},
-these characters are silently left out of the resultung
+these characters are silently left out of the resulting
 character set.
 
 The characters are added to @var{base_cs} and @var{base_cs} is
@@ -2247,7 +2282,10 @@ returned.
 
 @deffn {Scheme Procedure} ->char-set x
 @deffnx {C Function} scm_to_char_set (x)
-Coerces x into a char-set. @var{x} may be a string, character or char-set. A 
string is converted to the set of its constituent characters; a character is 
converted to a singleton set; a char-set is returned as-is.
+Coerces x into a char-set. @var{x} may be a string, character or
+char-set. A string is converted to the set of its constituent
+characters; a character is converted to a singleton set; a char-set is
+returned as-is.
 @end deffn
 
 @c ===================================================================
@@ -2258,6 +2296,23 @@ Coerces x into a char-set. @var{x} may be a string, 
character or char-set. A str
 Access the elements and other information of a character set with these
 procedures.
 
address@hidden {Scheme Procedure} %char-set-dump cs
+Returns an association list containing debugging information
+for @var{cs}. The association list has the following entries.
address@hidden @code
address@hidden char-set
+The char-set itself
address@hidden len
+The number of groups of contiguous code points the char-set
+contains
address@hidden ranges
+A list of lists where each sublist is a range of code points
+and their associated characters
address@hidden table
+The return value of this function cannot be relied upon to be
+consistent between versions of Guile and should not be used in code.
address@hidden deffn
+
 @deffn {Scheme Procedure} char-set-size cs
 @deffnx {C Function} scm_char_set_size (cs)
 Return the number of elements in character set @var{cs}.
@@ -2339,6 +2394,12 @@ must be a character set.
 Return the complement of the character set @var{cs}.
 @end deffn
 
+Note that the complement of a character set is likely to contain many
+reserved code points (code points that are not associated with
+characters).  It may be helpful to modify the output of
address@hidden by computing its intersection with the set
+of designated code points, @code{char-set:designated}.
+
 @deffn {Scheme Procedure} char-set-union . rest
 @deffnx {C Function} scm_char_set_union (rest)
 Return the union of all argument character sets.
@@ -2408,12 +2469,10 @@ useful, several predefined character set variables 
exist.
 @cindex charset
 @cindex locale
 
-Currently, the contents of these character sets are recomputed upon a
-successful @code{setlocale} call (@pxref{Locales}) in order to reflect
-the characters available in the current locale's codeset.  For
-instance, @code{char-set:letter} contains 52 characters under an ASCII
-locale (e.g., the default @code{C} locale) and 117 characters under an
-ISO-8859-1 (``Latin-1'') locale.
+These character sets are locale independent and are not recomputed
+upon a @code{setlocale} call.  They contain characters from the whole
+range of Unicode code points. For instance, @code{char-set:letter}
+contains about 94,000 characters.
 
 @defvr {Scheme Variable} char-set:lower-case
 @defvrx {C Variable} scm_char_set_lower_case
@@ -2427,13 +2486,16 @@ All upper-case characters.
 
 @defvr {Scheme Variable} char-set:title-case
 @defvrx {C Variable} scm_char_set_title_case
-This is empty, because ASCII has no titlecase characters.
+All single characters that function as if they were an upper-case
+letter followed by a lower-case letter.
 @end defvr
 
 @defvr {Scheme Variable} char-set:letter
 @defvrx {C Variable} scm_char_set_letter
-All letters, e.g. the union of @code{char-set:lower-case} and
address@hidden:upper-case}.
+All letters.  This includes @code{char-set:lower-case},
address@hidden:upper-case}, @code{char-set:title-case}, and many
+letters that have no case at all.  For example, Chinese and Japanese
+characters typically have no concept of case.
 @end defvr
 
 @defvr {Scheme Variable} char-set:digit
@@ -2463,23 +2525,26 @@ All whitespace characters.
 
 @defvr {Scheme Variable} char-set:blank
 @defvrx {C Variable} scm_char_set_blank
-All horizontal whitespace characters, that is @code{#\space} and
address@hidden
+All horizontal whitespace characters, which notably includes
address@hidden and @code{#\tab}.
 @end defvr
 
 @defvr {Scheme Variable} char-set:iso-control
 @defvrx {C Variable} scm_char_set_iso_control
-The ISO control characters with the codes 0--31 and 127.
+The ISO control characters are the C0 control characters (U+0000 to
+U+001F), delete (U+007F), and the C1 control characters (U+0080 to
+U+009F).
 @end defvr
 
 @defvr {Scheme Variable} char-set:punctuation
 @defvrx {C Variable} scm_char_set_punctuation
-The characters @code{!"#%&'()*,-./:;?@@address@hidden@}}
+All punctuation characters, such as the characters
address@hidden"#%&'()*,-./:;?@@address@hidden@}}
 @end defvr
 
 @defvr {Scheme Variable} char-set:symbol
 @defvrx {C Variable} scm_char_set_symbol
-The characters @code{$+<=>^`|~}.
+All symbol characters, such as the characters @code{$+<=>^`|~}.
 @end defvr
 
 @defvr {Scheme Variable} char-set:hex-digit
@@ -2497,9 +2562,17 @@ All ASCII characters.
 The empty character set.
 @end defvr
 
address@hidden {Scheme Variable} char-set:designated
address@hidden {C Variable} scm_char_set_designated
+This character set contains all designated code points.  This includes
+all the code points to which Unicode has assigned a character or other
+meaning.
address@hidden defvr
+
 @defvr {Scheme Variable} char-set:full
 @defvrx {C Variable} scm_char_set_full
-This character set contains all possible characters.
+This character set contains all possible code points.  This includes
+both designated and reserved code points.
 @end defvr
 
 @node Strings
@@ -2527,7 +2600,7 @@ memory.
 
 When one of these two strings is modified, as with @code{string-set!},
 their common memory does get copied so that each string has its own
-memory and modifying one does not accidently modify the other as well.
+memory and modifying one does not accidentally modify the other as well.
 Thus, Guile's strings are `copy on write'; the actual copying of their
 memory is delayed until one string is written to.
 
@@ -2947,7 +3020,7 @@ characters.
 @deffnx {C Function} scm_string_trim (s, char_pred, start, end)
 @deffnx {C Function} scm_string_trim_right (s, char_pred, start, end)
 @deffnx {C Function} scm_string_trim_both (s, char_pred, start, end)
-Trim occurrances of @var{char_pred} from the ends of @var{s}.
+Trim occurrences of @var{char_pred} from the ends of @var{s}.
 
 @code{string-trim} trims @var{char_pred} characters from the left
 (start) of the string, @code{string-trim-right} trims them from the
@@ -3229,14 +3302,14 @@ Compute a hash value for @var{S}.  the optional 
argument @var{bound} is a non-ne
 @deffn {Scheme Procedure} string-index s char_pred [start [end]]
 @deffnx {C Function} scm_string_index (s, char_pred, start, end)
 Search through the string @var{s} from left to right, returning
-the index of the first occurence of a character which
+the index of the first occurrence of a character which
 
 @itemize @bullet
 @item
 equals @var{char_pred}, if it is character,
 
 @item
-satisifies the predicate @var{char_pred}, if it is a procedure,
+satisfies the predicate @var{char_pred}, if it is a procedure,
 
 @item
 is in the set @var{char_pred}, if it is a character set.
@@ -3246,14 +3319,14 @@ is in the set @var{char_pred}, if it is a character set.
 @deffn {Scheme Procedure} string-rindex s char_pred [start [end]]
 @deffnx {C Function} scm_string_rindex (s, char_pred, start, end)
 Search through the string @var{s} from right to left, returning
-the index of the last occurence of a character which
+the index of the last occurrence of a character which
 
 @itemize @bullet
 @item
 equals @var{char_pred}, if it is character,
 
 @item
-satisifies the predicate @var{char_pred}, if it is a procedure,
+satisfies the predicate @var{char_pred}, if it is a procedure,
 
 @item
 is in the set if @var{char_pred} is a character set.
@@ -3307,14 +3380,14 @@ Is @var{s1} a suffix of @var{s2}, ignoring character 
case?
 @deffn {Scheme Procedure} string-index-right s char_pred [start [end]]
 @deffnx {C Function} scm_string_index_right (s, char_pred, start, end)
 Search through the string @var{s} from right to left, returning
-the index of the last occurence of a character which
+the index of the last occurrence of a character which
 
 @itemize @bullet
 @item
 equals @var{char_pred}, if it is character,
 
 @item
-satisifies the predicate @var{char_pred}, if it is a procedure,
+satisfies the predicate @var{char_pred}, if it is a procedure,
 
 @item
 is in the set if @var{char_pred} is a character set.
@@ -3324,14 +3397,14 @@ is in the set if @var{char_pred} is a character set.
 @deffn {Scheme Procedure} string-skip s char_pred [start [end]]
 @deffnx {C Function} scm_string_skip (s, char_pred, start, end)
 Search through the string @var{s} from left to right, returning
-the index of the first occurence of a character which
+the index of the first occurrence of a character which
 
 @itemize @bullet
 @item
 does not equal @var{char_pred}, if it is character,
 
 @item
-does not satisify the predicate @var{char_pred}, if it is a
+does not satisfy the predicate @var{char_pred}, if it is a
 procedure,
 
 @item
@@ -3342,7 +3415,7 @@ is not in the set if @var{char_pred} is a character set.
 @deffn {Scheme Procedure} string-skip-right s char_pred [start [end]]
 @deffnx {C Function} scm_string_skip_right (s, char_pred, start, end)
 Search through the string @var{s} from right to left, returning
-the index of the last occurence of a character which
+the index of the last occurrence of a character which
 
 @itemize @bullet
 @item
@@ -3367,7 +3440,7 @@ Return the count of the number of characters in the string
 equals @var{char_pred}, if it is character,
 
 @item
-satisifies the predicate @var{char_pred}, if it is a procedure.
+satisfies the predicate @var{char_pred}, if it is a procedure.
 
 @item
 is in the set @var{char_pred}, if it is a character set.
diff --git a/libguile/chars.c b/libguile/chars.c
index c2feaa6..59ac6f4 100644
--- a/libguile/chars.c
+++ b/libguile/chars.c
@@ -45,8 +45,8 @@ SCM_DEFINE (scm_char_p, "char?", 1, 0, 0,
 
 SCM_DEFINE1 (scm_char_eq_p, "char=?", scm_tc7_rpsubr,
              (SCM x, SCM y),
-             "Return @code{#t} iff code point of @var{x} is equal to the code 
point\n"
-             "of @var{y}, else @code{#f}.\n")
+             "Return @code{#t} if the Unicode code point of @var{x} is equal 
to the\n"
+             "code point of @var{y}, else @code{#f}.\n")
 #define FUNC_NAME s_scm_char_eq_p
 {
   SCM_VALIDATE_CHAR (1, x);
@@ -70,8 +70,8 @@ SCM_DEFINE1 (scm_char_less_p, "char<?", scm_tc7_rpsubr,
 
 SCM_DEFINE1 (scm_char_leq_p, "char<=?", scm_tc7_rpsubr,
              (SCM x, SCM y),
-             "Return @code{#t} iff the code point of @var{x} is less than or 
equal\n"
-             "to the code point of @var{y}, else @code{#f}.")
+             "Return @code{#t} if the Unicode code point of @var{x} is less 
than or\n"
+             "equal to the code point of @var{y}, else @code{#f}.")
 #define FUNC_NAME s_scm_char_leq_p
 {
   SCM_VALIDATE_CHAR (1, x);
@@ -82,8 +82,8 @@ SCM_DEFINE1 (scm_char_leq_p, "char<=?", scm_tc7_rpsubr,
 
 SCM_DEFINE1 (scm_char_gr_p, "char>?", scm_tc7_rpsubr,
              (SCM x, SCM y),
-             "Return @code{#t} iff the code point of @var{x} is greater than 
the\n"
-             "code point of @var{y}, else @code{#f}.")
+             "Return @code{#t} if the Unicode code point of @var{x} is greater 
than\n"
+             "the code point of @var{y}, else @code{#f}.")
 #define FUNC_NAME s_scm_char_gr_p
 {
   SCM_VALIDATE_CHAR (1, x);
@@ -94,8 +94,8 @@ SCM_DEFINE1 (scm_char_gr_p, "char>?", scm_tc7_rpsubr,
 
 SCM_DEFINE1 (scm_char_geq_p, "char>=?", scm_tc7_rpsubr,
              (SCM x, SCM y),
-             "Return @code{#t} iff the code point of @var{x} is greater than 
or\n"
-             "equal to the code point of @var{y}, else @code{#f}.")
+             "Return @code{#t} if the Unicode code point of @var{x} is greater 
than\n"
+             "or equal to the code point of @var{y}, else @code{#f}.")
 #define FUNC_NAME s_scm_char_geq_p
 {
   SCM_VALIDATE_CHAR (1, x);
@@ -113,8 +113,8 @@ SCM_DEFINE1 (scm_char_geq_p, "char>=?", scm_tc7_rpsubr,
 
 SCM_DEFINE1 (scm_char_ci_eq_p, "char-ci=?", scm_tc7_rpsubr,
              (SCM x, SCM y),
-             "Return @code{#t} iff the case-folded code point of @var{x} is 
the same\n"
-             "as the case-folded code point of @var{y}, else @code{#f}.")
+             "Return @code{#t} if the case-folded Unicode code point of 
@var{x} is\n"
+             "the same as the case-folded code point of @var{y}, else 
@code{#f}.")
 #define FUNC_NAME s_scm_char_ci_eq_p
 {
   SCM_VALIDATE_CHAR (1, x);
@@ -125,8 +125,8 @@ SCM_DEFINE1 (scm_char_ci_eq_p, "char-ci=?", scm_tc7_rpsubr,
 
 SCM_DEFINE1 (scm_char_ci_less_p, "char-ci<?", scm_tc7_rpsubr,
              (SCM x, SCM y),
-             "Return @code{#t} iff the case-folded code point of @var{x} is 
less\n"
-             "than the case-folded code point of @var{y}, else @code{#f}.")
+             "Return @code{#t} if the case-folded Unicode code point of 
@var{x} is\n"
+             "less than the case-folded code point of @var{y}, else 
@code{#f}.")
 #define FUNC_NAME s_scm_char_ci_less_p
 {
   SCM_VALIDATE_CHAR (1, x);
@@ -137,8 +137,8 @@ SCM_DEFINE1 (scm_char_ci_less_p, "char-ci<?", 
scm_tc7_rpsubr,
 
 SCM_DEFINE1 (scm_char_ci_leq_p, "char-ci<=?", scm_tc7_rpsubr,
              (SCM x, SCM y),
-             "Return @code{#t} iff the case-folded code point of @var{x} is 
less\n"
-             "than or equal to the case-folded code point of @var{y}, else\n"
+             "Return @code{#t} iff the case-folded Unicodd code point of 
@var{x} is\n"
+             "less than or equal to the case-folded code point of @var{y}, 
else\n"
              "@code{#f}")
 #define FUNC_NAME s_scm_char_ci_leq_p
 {
@@ -162,8 +162,8 @@ SCM_DEFINE1 (scm_char_ci_gr_p, "char-ci>?", scm_tc7_rpsubr,
 
 SCM_DEFINE1 (scm_char_ci_geq_p, "char-ci>=?", scm_tc7_rpsubr,
              (SCM x, SCM y),
-             "Return @code{#t} iff the case-folded code point of @var{x} is 
greater\n"
-             "than or equal to the case-folded code point of @var{y}, else\n"
+             "Return @code{#t} iff the case-folded Unicode code point of 
@var{x} is\n"
+             "greater than or equal to the case-folded code point of @var{y}, 
else\n"
              "@code{#f}.")
 #define FUNC_NAME s_scm_char_ci_geq_p
 {
@@ -224,7 +224,8 @@ SCM_DEFINE (scm_char_lower_case_p, "char-lower-case?", 1, 
0, 0,
 
 SCM_DEFINE (scm_char_is_both_p, "char-is-both?", 1, 0, 0, 
             (SCM chr),
-           "Return @code{#t} iff @var{chr} is either uppercase or lowercase, 
else @code{#f}.\n")
+           "Return @code{#t} iff @var{chr} is either uppercase or lowercase, 
else\n"
+            "@code{#f}.\n")
 #define FUNC_NAME s_scm_char_is_both_p
 {
   if (scm_is_true (scm_char_set_contains_p (scm_char_set_lower_case, chr)))
@@ -236,7 +237,7 @@ SCM_DEFINE (scm_char_is_both_p, "char-is-both?", 1, 0, 0,
 
 SCM_DEFINE (scm_char_to_integer, "char->integer", 1, 0, 0, 
             (SCM chr),
-            "Return the code point of @var{chr}.")
+            "Return the Unicode code point of @var{chr}.")
 #define FUNC_NAME s_scm_char_to_integer
 {
   SCM_VALIDATE_CHAR (1, chr);
@@ -247,9 +248,10 @@ SCM_DEFINE (scm_char_to_integer, "char->integer", 1, 0, 0,
 
 SCM_DEFINE (scm_integer_to_char, "integer->char", 1, 0, 0, 
            (SCM n),
-            "Return the character that has code point @var{n}.  The integer 
@var{n}\n"
-            "must be a valid code point.  Valid code points are in the ranges 
0 to\n"
-            "@code{#xD7FF} inclusive or @code{#xE000} to @code{#x10FFFF} 
inclusive.")
+            "Return the character that has Unicode code point @var{n}.  The 
integer\n"
+            "@var{n} must be a valid code point.  Valid code points are in 
the\n"
+            "ranges 0 to @code{#xD7FF} inclusive or @code{#xE000} to\n"
+            "@code{#x10FFFF} inclusive.")
 #define FUNC_NAME s_scm_integer_to_char
 {
   scm_t_wchar cn;
diff --git a/libguile/chars.h b/libguile/chars.h
index 69ef8d0..04eb9f0 100644
--- a/libguile/chars.h
+++ b/libguile/chars.h
@@ -47,9 +47,10 @@ typedef scm_t_int32 scm_t_wchar;
    ? SCM_MAKE_ITAG8 ((scm_t_bits) (unsigned char) (x), scm_tc8_char)    \
    : SCM_MAKE_ITAG8 ((scm_t_bits) (x), scm_tc8_char))
 
-#define SCM_CODEPOINT_MAX (0x10ffff)
+#define SCM_CODEPOINT_DOTTED_CIRCLE (0x25cc)
 #define SCM_CODEPOINT_SURROGATE_START (0xd800)
 #define SCM_CODEPOINT_SURROGATE_END (0xdfff)
+#define SCM_CODEPOINT_MAX (0x10ffff)
 #define SCM_IS_UNICODE_CHAR(c)                                          \
   (((scm_t_wchar) (c) >= 0                                              \
     && (scm_t_wchar) (c) < SCM_CODEPOINT_SURROGATE_START)               \
diff --git a/libguile/print.c b/libguile/print.c
index 86d067b..f4826d4 100644
--- a/libguile/print.c
+++ b/libguile/print.c
@@ -463,13 +463,26 @@ iprin1 (SCM exp, SCM port, scm_print_state *pstate)
                 /* Print the character if is graphic character.  */
                 {
                   scm_t_wchar *wbuf;
-                  SCM wstr = scm_i_make_wide_string (1, &wbuf);
+                  SCM wstr;
                   char *buf;
                   size_t len;
                   const char *enc;
 
                   enc = scm_i_get_port_encoding (port);
-                  wbuf[0] = i;
+                  if (uc_combining_class (i) == UC_CCC_NR)
+                    {
+                      wstr = scm_i_make_wide_string (1, &wbuf);
+                      wbuf[0] = i;
+                    }
+                  else
+                    {
+                      /* Character is a combining character: print it connected
+                         to a dotted circle instead of connecting it to the 
+                         backslash in '#\'  */
+                      wstr = scm_i_make_wide_string (2, &wbuf);
+                      wbuf[0] = SCM_CODEPOINT_DOTTED_CIRCLE;
+                      wbuf[1] = i;
+                    }
                   if (enc == NULL)
                     {
                       if (i <= 0xFF)
diff --git a/libguile/read.c b/libguile/read.c
index b2773cd..269e96b 100644
--- a/libguile/read.c
+++ b/libguile/read.c
@@ -844,6 +844,9 @@ scm_read_character (scm_t_wchar chr, SCM port)
     return SCM_MAKE_CHAR (scm_i_string_ref (charname, 0));
 
   cp = scm_i_string_ref (charname, 0);
+  if (cp == SCM_CODEPOINT_DOTTED_CIRCLE && charname_len == 2)
+    return SCM_MAKE_CHAR (scm_i_string_ref (charname, 1));
+
   if (cp >= '0' && cp < '8')
     {
       /* Dirk:FIXME::  This type of character syntax is not R5RS
diff --git a/libguile/srfi-14.c b/libguile/srfi-14.c
index 33b508d..5751bbe 100644
--- a/libguile/srfi-14.c
+++ b/libguile/srfi-14.c
@@ -34,6 +34,18 @@
 /* Include the pre-computed standard charset data.  */
 #include "libguile/srfi-14.i.c"
 
+scm_t_char_range cs_full_ranges[] = {
+  {0x0000, SCM_CODEPOINT_SURROGATE_START - 1}
+  ,
+  {SCM_CODEPOINT_SURROGATE_END + 1, SCM_CODEPOINT_MAX}
+};
+
+scm_t_char_set cs_full = {
+  2,
+  cs_full_ranges
+};
+
+
 #define SCM_CHARSET_DATA(charset) ((scm_t_char_set *) SCM_SMOB_DATA (charset))
 
 #define SCM_CHARSET_SET(cs, idx)                        \
@@ -2025,6 +2037,7 @@ SCM scm_char_set_hex_digit;
 SCM scm_char_set_blank;
 SCM scm_char_set_ascii;
 SCM scm_char_set_empty;
+SCM scm_char_set_designated;
 SCM scm_char_set_full;
 
 
@@ -2039,31 +2052,59 @@ define_charset (const char *name, const scm_t_char_set 
*p)
   return scm_permanent_object (cs);
 }
 
-#ifdef SCM_CHARSET_DEBUG
-SCM_DEFINE (scm_debug_char_set, "debug-char-set", 1, 0, 0,
-            (SCM charset),
-            "Print out the internal C structure of @var{charset}.\n")
-#define FUNC_NAME s_scm_debug_char_set
-{
-  int i;
-  scm_t_char_set *cs = SCM_CHARSET_DATA (charset);
-  fprintf (stderr, "cs %p\n", cs);
-  fprintf (stderr, "len %d\n", cs->len);
-  fprintf (stderr, "arr %p\n", cs->ranges);
+SCM_DEFINE (scm_sys_char_set_dump, "%char-set-dump", 1, 0, 0, (SCM charset), 
+            "Returns an association list containing debugging information\n"
+            "for @var{charset}. The association list has the following 
entries."
+            "@table @code\n"
+            "@item char-set\n"
+            "The char-set itself.\n"
+            "@item len\n"
+            "The number of character ranges the char-set contains\n"
+            "@item ranges\n"
+            "A list of lists where each sublist a range of code points\n"
+            "and their associated characters"
+            "@end table")
+#define FUNC_NAME s_scm_sys_char_set_dump
+{
+  SCM e1, e2, e3;
+  SCM ranges = SCM_EOL, elt;
+  size_t i;
+  scm_t_char_set *cs;
+  char codepoint_string_lo[9], codepoint_string_hi[9];
+
+  SCM_VALIDATE_SMOB (1, charset, charset);
+  cs = SCM_CHARSET_DATA (charset);
+
+  e1 = scm_cons (scm_from_locale_symbol ("char-set"),
+                 charset);
+  e2 = scm_cons (scm_from_locale_symbol ("n"),
+                 scm_from_size_t (cs->len));
+
   for (i = 0; i < cs->len; i++)
     {
-      if (cs->ranges[i].lo == cs->ranges[i].hi)
-        fprintf (stderr, "%04x\n", cs->ranges[i].lo);
+      if (cs->ranges[i].lo > 0xFFFF)
+        sprintf (codepoint_string_lo, "U+%06x", cs->ranges[i].lo);
+      else
+        sprintf (codepoint_string_lo, "U+%04x", cs->ranges[i].lo);
+      if (cs->ranges[i].hi > 0xFFFF)
+        sprintf (codepoint_string_hi, "U+%06x", cs->ranges[i].hi);
       else
-        fprintf (stderr, "%04x..%04x\t[%d]\n",
-                 cs->ranges[i].lo,
-                 cs->ranges[i].hi, cs->ranges[i].hi - cs->ranges[i].lo + 1);
+        sprintf (codepoint_string_hi, "U+%04x", cs->ranges[i].hi);
+
+      elt = scm_list_4 (SCM_MAKE_CHAR (cs->ranges[i].lo),
+                            SCM_MAKE_CHAR (cs->ranges[i].hi),
+                            scm_from_locale_string (codepoint_string_lo),
+                            scm_from_locale_string (codepoint_string_hi));
+      ranges = scm_append (scm_list_2 (ranges,
+                                       scm_list_1 (elt)));
     }
-  printf ("\n");
-  return SCM_UNSPECIFIED;
+  e3 = scm_cons (scm_from_locale_symbol ("ranges"),
+                 ranges);
+
+  return scm_list_3 (e1, e2, e3);
 }
 #undef FUNC_NAME
-#endif /* SCM_CHARSET_DEBUG */
+
 
 
 
@@ -2102,6 +2143,7 @@ scm_init_srfi_14 (void)
   scm_char_set_blank = define_charset ("char-set:blank", &cs_blank);
   scm_char_set_ascii = define_charset ("char-set:ascii", &cs_ascii);
   scm_char_set_empty = define_charset ("char-set:empty", &cs_empty);
+  scm_char_set_designated = define_charset ("char-set:designated", 
&cs_designated);
   scm_char_set_full = define_charset ("char-set:full", &cs_full);
 
 #include "libguile/srfi-14.x"
diff --git a/libguile/srfi-14.h b/libguile/srfi-14.h
index 1b9c295..4b1a4b2 100644
--- a/libguile/srfi-14.h
+++ b/libguile/srfi-14.h
@@ -100,9 +100,7 @@ SCM_API SCM scm_char_set_intersection_x (SCM cs1, SCM rest);
 SCM_API SCM scm_char_set_difference_x (SCM cs1, SCM rest);
 SCM_API SCM scm_char_set_xor_x (SCM cs1, SCM rest);
 SCM_API SCM scm_char_set_diff_plus_intersection_x (SCM cs1, SCM cs2, SCM rest);
-#if SCM_CHARSET_DEBUG
-SCM_API SCM scm_debug_char_set (SCM cs);
-#endif /* SCM_CHARSET_DEBUG */
+SCM_API SCM scm_sys_char_set_dump (SCM charset);
 
 SCM_API SCM scm_char_set_lower_case;
 SCM_API SCM scm_char_set_upper_case;
diff --git a/libguile/srfi-14.i.c b/libguile/srfi-14.i.c
index d92b4d7..fd537da 100644
--- a/libguile/srfi-14.i.c
+++ b/libguile/srfi-14.i.c
@@ -6253,7 +6253,7 @@ scm_t_char_set cs_empty = {
   cs_empty_ranges
 };
 
-scm_t_char_range cs_full_ranges[] = {
+scm_t_char_range cs_designated_ranges[] = {
   {0x0000, 0x0377}
   ,
   {0x037a, 0x037e}
@@ -7145,7 +7145,7 @@ scm_t_char_range cs_full_ranges[] = {
   {0x100000, 0x10fffd}
 };
 
-scm_t_char_set cs_full = {
+scm_t_char_set cs_designated = {
   445,
-  cs_full_ranges
+  cs_designated_ranges
 };
diff --git a/libguile/unidata_to_charset.pl b/libguile/unidata_to_charset.pl
index 61c8d10..d086c8e 100755
--- a/libguile/unidata_to_charset.pl
+++ b/libguile/unidata_to_charset.pl
@@ -254,8 +254,8 @@ sub empty {
     return 0;
 }
 
-# Full -- All characters except for the surrogates
-sub full {
+# Designated -- All characters except for the surrogates
+sub designated {
     my($codepoint, $name, $category, $uppercase, $lowercase)= @_;
     if ($category =~ (/Cs/)) {
         return 0;
@@ -387,7 +387,7 @@ compute "symbol";
 compute "blank";
 compute "ascii";
 compute "empty";
-compute "full";
+compute "designated";
 
 close $in;
 close $out;


hooks/post-receive
-- 
GNU Guile




reply via email to

[Prev in Thread] Current Thread [Next in Thread]