Re: [Bug-apl] How do I convert a byte sequence to Unicode?

From:

David Lamkins

Subject:

Date:

Sun, 27 Apr 2014 22:24:40 -0700

I take it that this table describes the encoding of the byte stream:

http://en.wikipedia.org/wiki/UTF-8#Description

(I might actually attempt this in APL, just to see whether I can do it while waiting for a built-in translation...)

On Sun, Apr 27, 2014 at 10:00 PM, Elias Mårtenson <address@hidden> wrote:

To convert byte values to code points, you need to apply an encoding algorithm, and that's kind of messy.

(I believe the rest of GNU APL kind of assumes that UTF-8 is the standard encoding used, which does make things simpler).

I have a suggestion: Make ⎕UCS support a dyadic form where the left-hand side specifies the encoding to use. I.e:

'UTF-8' ⎕UCS 99 100 101 102

Handling multiple encodings is easily done through the libiconv library. I worked on it when I made some improvements to its Common Lisp integration. It's quite simple to use.

Regards,
Elias

On 28 April 2014 12:49, David B. Lamkins <address@hidden> wrote:

That's close, but libfileio[8] returns a sequence of byte values; not
code points.

On Mon, 2014-04-28 at 12:19 +0800, Elias Mårtenson wrote:
> Use the quad function ⎕UCS:
>
>
> ⎕UCS 'foo⍉bar'
> 102 111 111 9033 98 97 114
> ⎕UCS 102 111 111 9033 98 97 114
> foo⍉bar
>
>
> Regards,
> Elias
>
>
> On 28 April 2014 12:17, David B. Lamkins <address@hidden> wrote:
> I can use lib_file_io to read a sequence of byte values from a
> file
> containing Unicode text.
>
> How do I convert that sequence back to a Unicode string in GNU
> APL?
>
>
>
>
>

"The secret to creativity is knowing how to hide your sources."

Albert Einstein

http://soundcloud.com/davidlamkins
http://reverbnation.com/lamkins
http://reverbnation.com/lcw
http://lamkins-guitar.com/
http://lamkins.net/
http://successful-lisp.com/