[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gnu-libiconv] about iconv errro convert form iso-2022-jp to UTF
From: |
Bruno Haible |
Subject: |
Re: [bug-gnu-libiconv] about iconv errro convert form iso-2022-jp to UTF-8 in linux |
Date: |
Sun, 02 Oct 2016 13:41:22 +0200 |
User-agent: |
KMail/4.8.5 (Linux/3.8.0-44-generic; KDE/4.8.5; x86_64; ; ) |
Hi,
吴贵勇 wrote on 2015-12-01:
> 1. I got a question for using linux command "iconv" convert from
> iso-2022-jp to UTF-8, got invalid input sequence error.
>
> the test file is d.txt, content hex data:
>
> 1b 24 42 38 2b 40 51 3d 71 21 21 1b 28 42 46 69 72 73 74 20 6f 6e 65 1b 24 42
> 2d 6a 21 21 36 53 3b 65 1b 28 42
>
> if look above data with ascii like this:
>
> 1b 24 42 38 2b 40 51 3d 71 21 21 1b 28 42 46 69 72 73 74 20 6f 6e 65
> 1b 24 42 2d 6a 21 21 36 53 3b 65 1b 28 42
>
> escape $ B 8 + @ Q = q ! ! esc ( B F i r s t o n e
> esc $ B - j ! ! 6 S ; e esc ( B
>
> 2. In fact the file d.txt come from base64 decoding from the string
>
> "=?ISO-2022-JP?B?GyRCOCtAUT1xISEbKEJGaXJzdCBvbmUbJEItaiEhNlM7ZRsoQg==?="
>
> I can looked it exactly in WindowsXP with OutLook Express: "見積書 First
> one㈱ 錦糸"
>
> But failed to decoding the string in linux; can not decoding the word ㈱.
> if delete the word ㈱(2d 6a), can decoding the other data.
>
> 3. I had recompiling the "libiconv-1.14.tar.gz" to decoding the file
> d.txt, but
> got the same failed resulte.
Indeed, I can reproduce it with
$ printf '\x1b\x24\x42\x2d\x6a\x1b\x28\x42' | iconv -f ISO-2022-JP-2
iconv: (stdin):1:0: cannot convert
But in the (not yet released) git version of libiconv, we now have an encoding
ISO-2022-JP-MS (a.k.a. CP50221), which supports the Microsoft extensions to
ISO-2022-JP:
$ printf '\x1b\x24\x42\x2d\x6a\x1b\x28\x42' | iconv -f ISO-2022-JP-MS
㈱
This encoding is, however, *not* contained in glibc's iconv. You'll have to
install GNU libiconv.
Bruno
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: [bug-gnu-libiconv] about iconv errro convert form iso-2022-jp to UTF-8 in linux,
Bruno Haible <=