[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Octave-bug-tracker] [bug #54170] java.lang.String.toCharArray result in
From: |
Andrew Janke |
Subject: |
[Octave-bug-tracker] [bug #54170] java.lang.String.toCharArray result incorrect conversion to char matrix |
Date: |
Sun, 1 Jul 2018 01:07:24 -0400 (EDT) |
User-agent: |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36 |
Follow-up Comment #15, bug #54170 (project octave):
Attaching a patch that fixes the segfault without changing any Octave char
type semantics: fix-java-char-boxing-segfault.patch
I had to pull the char conversion out of the macro in ov-java.cc, because the
macro assumes that the sizes for the underlying primitive types are the same
for Java and Octave, and char breaks this assumption.
With this in place, you can see some of the difficulty in the situation. It
creates an entry point for creating Octave char values that contain UTF-16
instead of UTF-8 data, where most Octave functions working with char assume
that the contents are UTF-8 (or the local character set?).
You can combine the results with chars produced using Octave '...' char
literals, and they produce results without errors, but the contents are
technically junk, because they contain a mix of UTF-8 and UTF-16 data, and are
thus not valid Unicode strings.
Example:
octave:11> jStr = javaObject('java.lang.String', 'world')
jStr =
<Java object: java.lang.String>
octave:12> format compact
octave:13> jStr = javaObject('java.lang.String', 'world')
jStr =
<Java object: java.lang.String>
octave:14> c = jStr.toCharArray'
c = world
octave:15> size(c)
ans =
1 10
octave:16> s1 = sprintf('Hello, %s!', jStr.toCharArray')
s1 = Hello, world!
octave:17> size(s1)
ans =
1 18
octave:18> num2cell(s1')
ans =
{
[1,1] = H
[2,1] = e
[3,1] = l
[4,1] = l
[5,1] = o
[6,1] = ,
[7,1] =
[8,1] = w
[9,1] =
[10,1] = o
[11,1] =
[12,1] = r
[13,1] =
[14,1] = l
[15,1] =
[16,1] = d
[17,1] =
[18,1] = !
}
octave:19> double(s1)
ans =
72 101 108 108 111 44 32 119 0 111 0 114 0
108 0 100 0 33
I haven't played around with this much, but I bet at least some Octave
functions (maybe in plotting?) will break on input like this, or even on valid
UTF-16 data in char arrays.
For example:
octave:22> surf(peaks)
octave:23> title(s1)
The displayed title is "Hello, w". I bet this is because some C-style API is
confused by the 0-valued bytes embedded in the string.
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/bugs/?54170>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
- [Octave-bug-tracker] [bug #54170] java.lang.String.toCharArray result incorrect conversion to char matrix,
Andrew Janke <=