guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode and Guile


From: Marius Vollmer
Subject: Re: Unicode and Guile
Date: Wed, 12 Nov 2003 03:30:23 +0100
User-agent: Gnus/5.1002 (Gnus v5.10.2) Emacs/21.3 (gnu/linux)

Tom Lord <address@hidden> writes:

>     > >     ~ (make-text-marker text index) => <marker>
>
>     > What about having _only_ markers and not allow integers as
>     > indices?
>
> Seems excessive and aribtrary.  How do I implement (Emacs') GOTO-CHAR
> without standing on my head?

Yes, right, there need to be conversions between markers and integers,
but I'm worried that people will write code like

    (do ((i 0 (1+ i))
         (>= i (text-length text)))
      (... (text-ref text i) ...))

and we'll have trouble implementing this efficiently for graphemes of
variable sizes.  When people are encouraged to use markers like this

    (do ((i (text-start text) (marker-forward i 1))
         ((marker-at-end? i)))
      (... (marker-ref i) ...))

things should be easier.  (Of course, there should also be things like
'text-map', etc.)

> (I strongly suggest splay trees as an ideal implementation strategy
> for for TEXT?.   They would make _both_ mutating and functional
> REPLACE efficient.)

Ok, if there is no cost for making texts mutable, we should of course
do it.

>
>     > >   There is no essential difference between a grapheme and a text
>     > >   object of length 1, and thus the proposal makes GRAPHEME? a 
>     > >   subtype of TYPE.
>
>     > Do we need the concept of grapheme at all, then?
>
> Interesting question!  And it ties in with your question about "why
> not just markers and not integer indexes".
>
> I don't see a good way to ground markers _without_ integer indexes.

Yes.  What I'm worried about is that it is expensive to go from an
integer index to the memory location where the indicated grapheme is
stored.  On the other hand, it us easy to increment the marker to the
next grapheme in a text.

> Graphemes are a reasonable "what the user thinks of as a character".

Yep, the concept of graphemes is useable, if only in the
documentation.  What I really had in mind was not the concept, but the
data type.  Is it important to have a new data type, or could we just
have

    (define (grapheme? obj) (and (text? obj) (= (text-length obj) 1)))
    (define grapheme=? text=?)
    (define grapheme<? text<?)
    ...

'read-grapheme' etc would probably need to remain.

-- 
GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3  331E FAF8 226A D5D4 E405




reply via email to

[Prev in Thread] Current Thread [Next in Thread]