octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Handle encoding of Octave strings


From: John W. Eaton
Subject: Re: Handle encoding of Octave strings
Date: Wed, 16 May 2018 17:24:13 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0

On 05/16/2018 04:10 PM, mmuetzel wrote:
I would like to make "islower" and "isupper" Unicode aware.
At the moment, I see the following:
octave:1> islower ('ä')
ans =

   0  0

Since we are using UTF-8 for character arrays, the single lower-case letter
"ä" is represented by two bytes:
octave:2> size ('ä')
ans =

    1   2

Should islower('ä') return true(1,2) or true(1,1)? I am tending towards the
former.

This leads to the bigger question: How should indexing on (multi-byte)
character arrays work? At the moment, a user has to be somewhat aware of the
fact that Octave uses UTF-8:
octave:3> str = "aäbc"
str = aäbc
octave:4> str(1)
ans = a
octave:5> str(2)
ans = �
octave:6> str(3)
ans = �
octave:7> str(4)
ans = b
octave:8> str(2:3)
ans = ä

To index the second character in the string, the user has to access the
second and(!) third element. The third character is indexed with the fourth
element and so forth.
Is this OK?

What does Matlab do? If your choice is different, I am sure that we will see bug reports about it.

jwe



reply via email to

[Prev in Thread] Current Thread [Next in Thread]