lilypond-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Multi-byte characters in Lyrics


From: David Kastrup
Subject: Re: Multi-byte characters in Lyrics
Date: Fri, 27 Oct 2017 12:34:31 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (gnu/linux)

Maurits Lamers <address@hidden> writes:

>> Op 27 okt. 2017, om 10:18 heeft David Kastrup <address@hidden> het
>> volgende geschreven:
>> 
>> Maurits Lamers <address@hidden <mailto:address@hidden>> writes:
>> 
>>> Hi,
>>> 
>>>>> 
>>>>> I cannot convert a multi-byte character to a symbol, unless I do some
>>>>> very inelegant hacks.
>>>> 
>>>> Huh?  string->symbol works just fine.  So what do you mean when you say
>>>> "symbol"?
>>> 
>>> This is partly because of a mistake on my end. I defined my braille
>>> dots lookup alist through symbols.
>>> 
>>> brailleSymbols = #`(
>>> (1 . 1)
>>> (2 . 12)
>>> (3 . 14)
>>> (4 . 145)
>>> )
>> 
>> There is no symbol here whatsoever.
>
> Mmm, I didn't get it to work without converting the char to a symbol. 
> This probably also has to do with me having to dive in scheme after
> just a few lessons quite a few years ago. :)

But there are no symbols in that alist to look up.  Just numbers.

>> Oh, bit shifting?  Probably for arriving at integers (rather than
>> characters)?  I was thinking of that but decided that sticking with
>> single-character strings was more likely to result in readable code.
>
> The bitshifting solution looks like this:
>
> #(define ((clz n) x)
>   (let loop ((i 1) (x x))
>     (if (< i n)
>         (loop (ash i 1) (logior x (ash x (- 0 i))))
>         (- n (logcount x)))))
>
> #(define (bitwise-andc1 x y)
>   (logand (lognot x) y)
>   )
>
> #(define (utf8-n o)
>   (max 1 ((clz 8) (bitwise-andc1 o #xff))))
>
> #(define (string->utf8-list str)
>   (if (equal? (string-length str) 0)
>     '()
>     (append '()
>       (list (string-copy str 0 (utf8-n (char->integer (string-ref str 0)))))
>       (string->utf8-list (string-copy str (utf8-n (char->integer (string-ref 
> str 0))))))
>   )
> )
>
> The bitshifting is used to figure out how many high bits are set, as
> in UTF8 those indicate how many chars it takes to have the full
> character.

Well, you can figure out the number of bytes in a character from its
starting byte.  But utf-8 has more features than that.  My code is
actually too complex, it suffices to do

#(define (b->c input)
   (cdr
    (string-fold-right
     (lambda (new tail)
       (if (char<? #\177 new #\300)
           (cons (cons new (car tail)) (cdr tail))
           (cons* '() (apply string new (car tail)) (cdr tail))))
     '(())
     input)))

Basically just go from right to left, collect all continuation bytes and
tack them onto what comes before.  Of course, this requires valid utf-8
to start with, but the so does the other and is more complex.

> The solution then splits the string by the amount of chars it take to
> assemble the full character

Just split before anything outside of codes 80-bf.

>> Though I figured with some consternation that something like
>> 
>> "⁹" resulted in garbage being printed, so the readability does not
>> really extend to the output.
>
> Interesting, will have to try the bitshifting solution with that :)

It has nothing to do with how you arrive at "⁹": Guile-1.8 will not
print it properly.

-- 
David Kastrup



reply via email to

[Prev in Thread] Current Thread [Next in Thread]