[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: guile can't find a chinese named file
From: |
Andy Wingo |
Subject: |
Re: guile can't find a chinese named file |
Date: |
Sun, 26 Feb 2017 22:20:31 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) |
Hello,
I feel the need to correct points in this mail for the benefit of
guile-user. No reply is needed.
On Wed 15 Feb 2017 00:58, David Kastrup <address@hidden> writes:
> Mike Gran <address@hidden> writes:
>
>> But, for what it is worth, the Latin-1/UCS-32 design decision came
>> from a couple of conflicting requirements. The switch happened in the
>> 1.9.x series.
>>
>> There was several examples of legacy C code using Guile for an
>> extension language that accessed the bytes of a string directly, using
>>
>> SCM_STRING_CHARS or scm_i_string_chars. To keep from breaking legacy
>> code, we needed to retain the capability to use this (then already
>> deprecated) capability to have C programs access 8-bit-locale string
>> internals directly.
>
> But if you don't know whether the strings are Latin-1 or UCS-32, that's
> sort of academical.
Not at all. Legacy programs don't use codepoints >255. For UTF-32,
attempting to get the string data would throw an exception. The
SCM_STRING_CHARS hack was a good trade-off.
> The problem is that Guile is _constantly_ required to recode strings it
> is processing. And to add insult to injury, it cannot do this without
> data loss when its string encoding assumptions are wrong.
In Scheme, strings are sequences of characters. Encoding and decoding
is only needed when going to and from bytes. Guile supports a finite
number of encodings, so in general some encoding/decoding will always be
needed. The specific encoding may change over time.
> PostScript files are usually encoded in Latin-1 with occasional UCS-16
> passages. Reading and writing and copying such files byte-correctly
> while trying to actually parse their contents is not feasible with
> Guile.
Works perfectly well. The web server for example reads the request as
Latin-1 and the body as something else. Just re-set the port encoding
and there you go.
>> I still maintain that this design decision was a good one based on the
>> simplicity of implementation.
>
> As I said: the problem is not the chosen internal representation. The
> problem is that there is no API to access it, and it does not even map
> to string ports.
String ports have nothing to do with the discussion AFAIU. (Ports in
Guile are sequences of bytes also. They may be accessed using textual
interfaces as well. Therefore a string port must have an associated
encoding, to read/write the bytes. But no error is possible for textual
I/O with the default UTF-8 encoding as all characters are representable.
Encoding to UTF-8 is fast and space-efficient.)
Andy
- Re: guile can't find a chinese named file, (continued)
- Re: guile can't find a chinese named file, Marko Rauhamaa, 2017/02/16
- Re: guile can't find a chinese named file, Eli Zaretskii, 2017/02/16
- Re: guile can't find a chinese named file, tomas, 2017/02/15
- Re: guile can't find a chinese named file, Eli Zaretskii, 2017/02/16
- Re: guile can't find a chinese named file, David Kastrup, 2017/02/14
- Re: guile can't find a chinese named file, tomas, 2017/02/15
- Re: guile can't find a chinese named file, Marko Rauhamaa, 2017/02/15
- Re: guile can't find a chinese named file,
Andy Wingo <=
- Re: guile can't find a chinese named file, David Kastrup, 2017/02/27
- Re: guile can't find a chinese named file, Andy Wingo, 2017/02/27
- Re: guile can't find a chinese named file, David Kastrup, 2017/02/27
- Re: guile can't find a chinese named file, Andy Wingo, 2017/02/27
- Re: guile can't find a chinese named file, Eli Zaretskii, 2017/02/27
- Re: guile can't find a chinese named file, Andy Wingo, 2017/02/27
- Re: guile can't find a chinese named file, Jan Wedekind, 2017/02/27
- Re: guile can't find a chinese named file, Eli Zaretskii, 2017/02/27
Re: guile can't find a chinese named file, Ludovic Courtès, 2017/02/14