[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Wide string strategies
From: |
Ludovic Courtès |
Subject: |
Re: Wide string strategies |
Date: |
Thu, 09 Apr 2009 22:25:55 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/23.0.90 (gnu/linux) |
Hi!
Thank you for working on this!
Mike Gran <address@hidden> writes:
> For now, I think a good strategy is to make strings into a pseudo-class
> where the internals are opaque to most of Guile and strings are accessed
> through accessors and other methods.
Sounds reasonable.
> This was the strategy already begun with scm_to_locale_string but the
> code isn't fully committed to the idea. The function scm_i_string_chars
> exposes the internal representation of the string, and it is used
> throughout the code.
Yes, that's going to be difficult, but I can't think of a better
solution.
> - SCM_WTA_DISPATCH_1 (*SCM_SUBR_GENERIC (proc), arg1,
> - SCM_ARG1, scm_i_symbol_chars (SCM_SNAME (proc)));
> + {
> + char *str = scm_to_locale_string (scm_symbol_to_string (SCM_SNAME
> (proc)));
> + SCM_WTA_DISPATCH_1 (*SCM_SUBR_GENERIC (proc), arg1, SCM_ARG1, str);
> + free (str);
> + }
This is the kind of thing we can't afford in most cases.
Here STR is only needed because `SCM_WTA_DISPATCH_1 ()' calls
`scm_wrong_type_arg ()', which operates on C strings.
One solution would be to change `scm_wrong_type_arg ()' to operate on
opaque strings (e.g., take an `SCM' instead of `const char *'). The
same applies to all the functions in "error.h", and probably many
others.
> - if (len > 0 && (s[0] == '/' || s[0] == '\\'))
> + if (len > 0 && (scm_i_string_ref_eq_char (filename, 0, '/')
> + || scm_i_string_ref_eq_char (filename, 0, '\\')))
I think procedures like `scm_i_string_ref_eq_char ()' are a good idea
because it fulfills the goal of having an opaque string type *and* the
goal of being able to handle them easily in C.
> - scm_ungets (scm_i_string_chars (str), scm_i_string_length (str), port);
> + buf = scm_to_locale_stringn (str, &len);
> + scm_ungets (buf, len, port);
> + free (buf);
Eventually, we might need to change `scm_ungets ()', or provide a
variant that takes an opaque string.
> @@ -1549,12 +1552,16 @@ SCM_DEFINE (scm_recvfrom, "recvfrom!", 2, 3, 0,
>
> /* recvfrom will not necessarily return an address. usually nothing
> is returned for stream sockets. */
> - buf = scm_i_string_writable_chars (str);
> + buf = scm_malloc (cend - offset);
All the POSIX interface needs fast access to ASCII strings. How about
something like:
const char *layout = scm_i_ascii_symbol_chars (SCM_PACK (slayout));
where `scm_i_ascii_symbol_chars ()' throws an exception if its argument
is a non-ASCII symbol?
This would mean special-casing ASCII stringbufs so that we can treat
them as C strings.
> +static const char *
> +make_failsafe_ascii_sz (const char *str, size_t len)
> +{
> + static char buf[SCM_FAILSAFE_STRING_LEN];
Ouch, that would be bug-prone.
> +SCM_INTERNAL int scm_i_string_ref_eq_char (SCM str, size_t x, char c);
> +SCM_INTERNAL int scm_i_symbol_ref_eq_char (SCM str, size_t x, char c);
> +SCM_INTERNAL int scm_i_string_ref_neq_char (SCM str, size_t x, char c);
> +SCM_INTERNAL int scm_i_symbol_ref_neq_char (SCM str, size_t x, char c);
I'd remove the `neq' variants.
> +SCM_INTERNAL int scm_i_string_ref_eq_int (SCM str, size_t x, int c);
Does it assume sizeof (int) >= 32 ?
> +SCM_INTERNAL size_t scm_i_string_contains_char (SCM str, char ch);
Since it really returns a boolean, I'd use `int' as the return type.
> +SCM_INTERNAL char *scm_i_string_to_write_sz (SCM str);
> +SCM_INTERNAL scm_t_uint8 *scm_i_string_to_u8sz (SCM str);
> +SCM_INTERNAL SCM scm_i_string_from_u8sz (const scm_t_uint8 *str);
> +SCM_INTERNAL const char *scm_i_string_to_failsafe_ascii_sz (SCM str);
> +SCM_INTERNAL const char *scm_i_symbol_to_failsafe_ascii_sz (SCM str);
What does "sz" mean?
> +/* For ASCII strings, SUB can be used to represent an invalid
> + character. */
> +#define SCM_SUB ('\x1A')
Why SUB? How about `SCM_I_SUB_CHAR', `SCM_I_INVALID_ASCII_CHAR' or
similar?
Thanks,
Ludo'.