[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: fencepost error in encoding processing
From: |
Ken Raeburn |
Subject: |
Re: fencepost error in encoding processing |
Date: |
Mon, 16 Nov 2009 12:25:17 -0500 |
On Nov 16, 2009, at 08:03, Ludovic Courtès wrote:
As far as encoding names are concerned, Bruno Haible pointed me to
http://www.iana.org/assignments/character-sets and I added a link to
it
in the manual a couple of days ago.
Between your link and Mike's, it looks to me like we should add
several more characters.
The GNU libc code adds ":" and "," to the list. The comment in
iconv_open doesn't list the comma, but the function it calls does
permit it. There's also some special handling of "/".
The IANA list shows names using "+" and parens ("ebcdic-us-37+euro",
"NF_Z_62-010_(1973)"), as well as colons.
I've skimmed the ICU page Mike pointed to, and it includes names like
"UTF-16BE,version=1" and "ibm-1149_P100-197,swaplfnl" as well as "+"
and ":" names, when showing "all aliases". If we only try to support,
say, IANA and MIME, then "+" and ":" are used but not "=".
Since we're scanning an Emacs-style coding specification, as long as
whitespace and semicolon aren't on the list, I think we can be
expansive, so let's go ahead and include all of ":,+=/()" to the
allowed set. The results will still be constrained by whatever the OS
supports; we just don't want Guile to impose additional constraints.
Should we allow punctuation in general by calling ispunct (and
explicitly checking for semicolon) instead? (Note that isalnum and
ispunct will also check for locale-specific characters... of course,
the new encoding spec hasn't come into effect yet....)
Ken
Allow more characters in coding system names in Emacs-style
declarations.
* libguile/read.c (scm_i_scan_for_encoding): Allow more punctuation
symbols in coding system names.
diff --git a/libguile/read.c b/libguile/read.c
index 775612a..657e101 100644
--- a/libguile/read.c
+++ b/libguile/read.c
@@ -1506,8 +1506,7 @@ scm_i_scan_for_encoding (SCM port)
i = 0;
while (pos + i - header <= SCM_ENCODING_SEARCH_SIZE
&& pos + i - header < bytes_read
- && (isalnum((int) pos[i]) || pos[i] == '_' || pos[i] == '-'
- || pos[i] == '.'))
+ && (isalnum((int) pos[i]) || strchr("_-.:/,+=()", pos[i]) != NULL))
i++;
if (i == 0)