[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-apl] Incorrect encoding in ATF files.
From: |
Kacper Gutowski |
Subject: |
Re: [Bug-apl] Incorrect encoding in ATF files. |
Date: |
Mon, 20 Jan 2014 02:39:19 +0100 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On 2014-01-19 16:29:44, Juergen Sauermann wrote:
> I changed the following in SVN version 98:
Thanks!
Now whole workspace exported from NARS2000 can be successfully imported
with GNU APL without any problems (as long it doesn't contain unsupported
features like dfns or function trains, of course).
For test, I also tried to import a clear workspace from Dyalog APL and
I stumbled upon two records where it still fails:
⎕RTL←0
⎕TRAP←0⍴⊂⍬ ' ' ''.
The first one surprised me as other unknown variables are now ignored
but when I enter ⎕RTL←0 manually into session, I get:
⎕RTL←0
VALUE ERROR
⎕R TL←0
^
and the variable TL gets set to 0, so maybe it's something with parser?
The second mentioned record seems to fail because of the zilde in it.
Dyalog encodes ⍬ as byte 0b which GNU APL interprets literally as control
character U+000B (vertical tab) therefore it raises error outside quoted
string. I'm not really sure how should this be dealt with. It doesn't
seem to be a problem on GNU APL's end. Interestingly, GNU APL writes ⍬
in portable manner as (0⍴0) while NARS2000 creates invalid file it can't
read itself.
> for example when Unicode characters that are not contained in the
> charset of the interpreter
> appear in strings.
I did some testing on this.
As for the encoding of Unicode characters outside transport character set,
on )OUT, GNU APL correctly encodes character arrays using ⎕UCS and both
Dyalog and NARS2000 read it without problems. NARS2000 again can make a
file it can't read itself and Dyalog APL uses catenated mixture of literal
quoted strings, parts encoded with ⎕ucs something, and ⎕av[⎕io+something]
(sic!) depending on contents. Notably, GNU APL fails trying to read
catenated values, while NARS2000 has no problems with it (but it gets
wrong values when ⎕av is used, obviously).
Inside functions, however, all characters that can't be encoded directly
are lost when exporting from GNU APL, i.e. they are coerced to ░ when
read back by GNU APL or NARS, or to Ý by Dyalog. For the record, NARS2000
also does it this way.
But since functions are transported as ⎕FX call with nested character
array as right argument, they could be encoded with ⎕UCS as well.
This is how Dyalog exports such functions (actually, again, it catenates
from various parts) and both Dyalog and NARS2000 can import functions
formed this way, but GNU APL currently can not (even if using only single
⎕UCS call without catenation).
To sum my findings up:
1. ⎕RTL isn't correctly ignored.
2. Functions formed with ⎕UCS can't be read from ATF file.
For example, attached file test1.atf (I hope it won't get stripped by
the list) contains a function printing 'ąćśł' formed as:
⎕FX 'test1' (⎕UCS 39 261 263 347 322 39)
Both Dyalog APL and NARS2000 read it in correctly. GNU APL gives
this error:
)in test1
Avec::uni_to_token() : Char U+0105 (ą) not found in ⎕AV! (called from
Tokenizer.cc:80)
Offending token: 0x56020011 (VOID)
DOMAIN ERROR
immediate_execution() caught APL error 0x50004 (DOMAIN ERROR)
3. Character arrays formed by catenation can't be read from ATF file.
For example, see attached x.atf which contains a variable x←'abcąłćę'
formed as:
x←'abc',⎕UCS 261 322 263 281
Both Dyalog APL and NARS2000 import it correctly, GNU APL gives this error:
)in x
==============================================================================
Assertion failed: var_or_fun.size()
in Function: array_2TF
in file: Command.cc:1001
Call stack:
----------------------------------------
-- Stack trace at Command.cc:1001
----------------------------------------
0x7faf1a4a4995 __libc_start_main
0x433e5d main
0x5114fd Workspace::immediate_execution(bool)
0x45edfd Command::process_line()
0x45e4c8 Command::process_line(UCS_string&)
0x45d46a Command::cmd_IN(std::ostream&, std::vector<UCS_string,
std::allocator<UCS_string> >&, bool)
0x45d0bb Command::transfer_context::process_record(unsigned char const*,
std::vector<UCS_string, std::allocator<UCS_string> > const&)
0x45b52f Command::transfer_context::array_2TF(std::vector<UCS_string,
std::allocator<UCS_string> > const&) const
0x44213f do_Assert(char const*, char const*, char const*, int)
========================================
SI stack:
==============================================================================
*** immediate_execution() caught other exception ***
4. Unicode characters outside transport charset are not preserved inside
functions.
This could be fixed by exporting them with ⎕UCS like it's done in case of
character
arrays but at the moment such function can't be read either as I mentioned
above.
Sorry for lengthy mail.
-k
test1.atf
Description: Binary data
x.atf
Description: Binary data
Re: [Bug-apl] Incorrect encoding in ATF files., Kacper Gutowski, 2014/01/24