Re: Handle encoding of Octave strings

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Handle encoding of Octave strings

From:	John W. Eaton
Subject:	Re: Handle encoding of Octave strings
Date:	Wed, 16 May 2018 17:24:13 -0400
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0

On 05/16/2018 04:10 PM, mmuetzel wrote:

I would like to make "islower" and "isupper" Unicode aware.
At the moment, I see the following:
octave:1> islower ('ä')
ans =

   0  0

Since we are using UTF-8 for character arrays, the single lower-case letter
"ä" is represented by two bytes:
octave:2> size ('ä')
ans =

    1   2

Should islower('ä') return true(1,2) or true(1,1)? I am tending towards the
former.

This leads to the bigger question: How should indexing on (multi-byte)
character arrays work? At the moment, a user has to be somewhat aware of the
fact that Octave uses UTF-8:
octave:3> str = "aäbc"
str = aäbc
octave:4> str(1)
ans = a
octave:5> str(2)
ans = �
octave:6> str(3)
ans = �
octave:7> str(4)
ans = b
octave:8> str(2:3)
ans = ä

To index the second character in the string, the user has to access the
second and(!) third element. The third character is indexed with the fourth
element and so forth.
Is this OK?

What does Matlab do? If your choice is different, I am sure that wewill see bug reports about it.

jwe

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Handle encoding of Octave strings, mmuetzel, 2018/05/05
- Re: Handle encoding of Octave strings, mmuetzel, 2018/05/16
  - Re: Handle encoding of Octave strings, John W. Eaton <=
    - Re: Handle encoding of Octave strings, Nicholas Jankowski, 2018/05/16
    - Re: Handle encoding of Octave strings, mmuetzel, 2018/05/17
    - Re: Handle encoding of Octave strings, mmuetzel, 2018/05/17

Prev by Date: Re: Handle encoding of Octave strings
Next by Date: Re: Handle encoding of Octave strings
Previous by thread: Re: Handle encoding of Octave strings
Next by thread: Re: Handle encoding of Octave strings
Index(es):
- Date
- Thread