[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Guile-commits] GNU Guile branch, master, updated. release_1-9-2-164
From: |
Mike Gran |
Subject: |
Re: [Guile-commits] GNU Guile branch, master, updated. release_1-9-2-164-g0d05ae7 |
Date: |
Tue, 08 Sep 2009 21:16:34 -0700 |
On Wed, 2009-09-09 at 01:00 +0200, Ludovic Courtès wrote:
> Hello!
>
> "Michael Gran" <address@hidden> writes:
>
> > http://git.savannah.gnu.org/cgit/guile.git/commit/?id=0d05ae7c4b1eddf6257f99f44eaf5cb7b11191be
>
> [...]
>
> > - return scm_getc (input_port);
> > + return scm_get_byte_or_eof (input_port);
>
> This is actually an earlier change, but the prototype of scm_getc is now
> different from that in 1.8. Presumably, this means that it’s not
> source-compatible with 1.8, e.g., on platforms where
> sizeof (int) < sizeof (scm_t_wchar), right?
The readline library can't handle UCS-4 codepoints, but, it is capable
of dealing with locale-encoded text. So, it needs to have the raw bytes
of the locale-encoded characters, and scm_get_byte_or_eof returns the
raw bytes.
>
> > --- a/libguile/strings.h
> > +++ b/libguile/strings.h
> > @@ -111,7 +111,7 @@ SCM_API SCM scm_substring_shared (SCM str, SCM start,
> > SCM end);
> > SCM_API SCM scm_substring_copy (SCM str, SCM start, SCM end);
> > SCM_API SCM scm_string_append (SCM args);
> >
> > -SCM_INTERNAL SCM scm_i_from_stringn (const char *str, size_t len,
> > +SCM_API SCM scm_i_from_stringn (const char *str, size_t len,
> > const char *encoding,
> >
> > scm_t_string_failed_conversion_handler
> > handler);
> > @@ -157,7 +157,7 @@ SCM_INTERNAL const scm_t_wchar *scm_i_string_wide_chars
> > (SCM str);
> > SCM_INTERNAL SCM scm_i_string_start_writing (SCM str);
> > SCM_INTERNAL void scm_i_string_stop_writing (void);
> > SCM_INTERNAL int scm_i_is_narrow_string (SCM str);
> > -SCM_INTERNAL scm_t_wchar scm_i_string_ref (SCM str, size_t x);
> > +SCM_API scm_t_wchar scm_i_string_ref (SCM str, size_t x);
>
> Were these changes intended?
Well, one of the two of them was intended. :)
>
> > + (with-locale "en_US.iso88591"
> > + (pass-if-exception "no args" exception:wrong-num-args
> > + (regexp-quote))
>
> Is the locale part of the API? That is, should programs that use
> regexps explicitly ask for a locale with 8-bit encoding?
Basically yes. On Wed, 2009-09-09 at 01:00 +0200, Ludovic Courtès
wrote:
> Hello!
>
> "Michael Gran" <address@hidden> writes:
>
> > http://git.savannah.gnu.org/cgit/guile.git/commit/?id=0d05ae7c4b1eddf6257f99f44eaf5cb7b11191be
>
> [...]
>
> > - return scm_getc (input_port);
> > + return scm_get_byte_or_eof (input_port);
>
> This is actually an earlier change, but the prototype of scm_getc is now
> different from that in 1.8. Presumably, this means that it’s not
> source-compatible with 1.8, e.g., on platforms where
> sizeof (int) < sizeof (scm_t_wchar), right?
The readline library can't handle UCS-4 codepoints, but, it is capable
of dealing with locale-encoded text. So, it needs to have the raw bytes
of the locale-encoded characters, and scm_get_byte_or_eof returns the
raw bytes instead of doing the processing necessary to make codepoints.
>
> > --- a/libguile/strings.h
> > +++ b/libguile/strings.h
> > @@ -111,7 +111,7 @@ SCM_API SCM scm_substring_shared (SCM str, SCM start,
> > SCM end);
> > SCM_API SCM scm_substring_copy (SCM str, SCM start, SCM end);
> > SCM_API SCM scm_string_append (SCM args);
> >
> > -SCM_INTERNAL SCM scm_i_from_stringn (const char *str, size_t len,
> > +SCM_API SCM scm_i_from_stringn (const char *str, size_t len,
> > const char *encoding,
> >
> > scm_t_string_failed_conversion_handler
> > handler);
> > @@ -157,7 +157,7 @@ SCM_INTERNAL const scm_t_wchar *scm_i_string_wide_chars
> > (SCM str);
> > SCM_INTERNAL SCM scm_i_string_start_writing (SCM str);
> > SCM_INTERNAL void scm_i_string_stop_writing (void);
> > SCM_INTERNAL int scm_i_is_narrow_string (SCM str);
> > -SCM_INTERNAL scm_t_wchar scm_i_string_ref (SCM str, size_t x);
> > +SCM_API scm_t_wchar scm_i_string_ref (SCM str, size_t x);
>
> Were these changes intended?
Well, one of the two of them was intended. :)
>
> > + (with-locale "en_US.iso88591"
> > + (pass-if-exception "no args" exception:wrong-num-args
> > + (regexp-quote))
>
> Is the locale part of the API? That is, should programs that use
> regexps explicitly ask for a locale with 8-bit encoding?
Basically yes. The libc regex is 8-bit, and it uses
scm_to/from_locale_string to convert regex's input and output.
Until libunistring comes with Unicode regex, I think this is the best we
can do.
Thanks,
Mike